A significant number of hotel bookings are called-off due to cancellations or no-shows. The typical reasons for cancellations include change of plans, scheduling conflicts, etc. This is often made easier by the option to do so free of charge or preferably at a low cost which is beneficial to hotel guests but it is a less desirable and possibly revenue-diminishing factor for hotels to deal with. Such losses are particularly high on last-minute cancellations.
The new technologies involving online booking channels have dramatically changed customers’ booking possibilities and behavior. This adds a further dimension to the challenge of how hotels handle cancellations, which are no longer limited to traditional booking and guest characteristics.
The cancellation of bookings impact a hotel on various fronts:
The increasing number of cancellations calls for a Machine Learning based solution that can help in predicting which booking is likely to be canceled. INN Hotels Group has a chain of hotels in Portugal, they are facing problems with the high number of booking cancellations and have reached out to your firm for data-driven solutions. You as a data scientist have to analyze the data provided to find which factors have a high influence on booking cancellations, build a predictive model that can predict which booking is going to be canceled in advance, and help in formulating profitable policies for cancellations and refunds.
The data contains the different attributes of customers' booking details. The detailed data dictionary is given below.
Data Dictionary
# this will help in making the Python code more structured automatically (help adhere to good coding practices)
%reload_ext nb_black
import warnings
warnings.filterwarnings("ignore")
from statsmodels.tools.sm_exceptions import ConvergenceWarning
warnings.simplefilter("ignore", ConvergenceWarning)
# Libraries to help with reading and manipulating data
import pandas as pd
import numpy as np
# libaries to help with data visualization
import matplotlib.pyplot as plt
import seaborn as sns
# Removes the limit for the number of displayed columns
pd.set_option("display.max_columns", None)
# Sets the limit for the number of displayed rows
pd.set_option("display.max_rows", 200)
# setting the precision of floating numbers to 5 decimal points
pd.set_option("display.float_format", lambda x: "%.5f" % x)
# Library to split data
from sklearn.model_selection import train_test_split
# To build model for prediction
import statsmodels.stats.api as sms
from statsmodels.stats.outliers_influence import variance_inflation_factor
import statsmodels.api as sm
from statsmodels.tools.tools import add_constant
from sklearn.tree import DecisionTreeClassifier
from sklearn import tree
# To tune different models
from sklearn.model_selection import GridSearchCV
# To get diferent metric scores
from sklearn.metrics import (
f1_score,
accuracy_score,
recall_score,
precision_score,
confusion_matrix,
roc_auc_score,
plot_confusion_matrix,
precision_recall_curve,
roc_curve,
make_scorer,
)
# lets load the dataset using pandas read_csv
data = pd.read_csv(
"C:/Users/SridharGudimella/Desktop/DS training Notebooks/Project5 Decision Tree/INNHotelsGroup.csv"
)
# lets verify if the dataset is loaded or not using head() to see the first 5 rows in the dataset
data.head()
| Booking_ID | no_of_adults | no_of_children | no_of_weekend_nights | no_of_week_nights | type_of_meal_plan | required_car_parking_space | room_type_reserved | lead_time | arrival_year | arrival_month | arrival_date | market_segment_type | repeated_guest | no_of_previous_cancellations | no_of_previous_bookings_not_canceled | avg_price_per_room | no_of_special_requests | booking_status | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | INN00001 | 2 | 0 | 1 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 224 | 2017 | 10 | 2 | Offline | 0 | 0 | 0 | 65.00000 | 0 | Not_Canceled |
| 1 | INN00002 | 2 | 0 | 2 | 3 | Not Selected | 0 | Room_Type 1 | 5 | 2018 | 11 | 6 | Online | 0 | 0 | 0 | 106.68000 | 1 | Not_Canceled |
| 2 | INN00003 | 1 | 0 | 2 | 1 | Meal Plan 1 | 0 | Room_Type 1 | 1 | 2018 | 2 | 28 | Online | 0 | 0 | 0 | 60.00000 | 0 | Canceled |
| 3 | INN00004 | 2 | 0 | 0 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 211 | 2018 | 5 | 20 | Online | 0 | 0 | 0 | 100.00000 | 0 | Canceled |
| 4 | INN00005 | 2 | 0 | 1 | 1 | Not Selected | 0 | Room_Type 1 | 48 | 2018 | 4 | 11 | Online | 0 | 0 | 0 | 94.50000 | 0 | Canceled |
# copying data to another varaible to avoid any changes to original data
hotel = data.copy()
# lets see the first 5 rows in the dataset
hotel.head(5)
| Booking_ID | no_of_adults | no_of_children | no_of_weekend_nights | no_of_week_nights | type_of_meal_plan | required_car_parking_space | room_type_reserved | lead_time | arrival_year | arrival_month | arrival_date | market_segment_type | repeated_guest | no_of_previous_cancellations | no_of_previous_bookings_not_canceled | avg_price_per_room | no_of_special_requests | booking_status | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | INN00001 | 2 | 0 | 1 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 224 | 2017 | 10 | 2 | Offline | 0 | 0 | 0 | 65.00000 | 0 | Not_Canceled |
| 1 | INN00002 | 2 | 0 | 2 | 3 | Not Selected | 0 | Room_Type 1 | 5 | 2018 | 11 | 6 | Online | 0 | 0 | 0 | 106.68000 | 1 | Not_Canceled |
| 2 | INN00003 | 1 | 0 | 2 | 1 | Meal Plan 1 | 0 | Room_Type 1 | 1 | 2018 | 2 | 28 | Online | 0 | 0 | 0 | 60.00000 | 0 | Canceled |
| 3 | INN00004 | 2 | 0 | 0 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 211 | 2018 | 5 | 20 | Online | 0 | 0 | 0 | 100.00000 | 0 | Canceled |
| 4 | INN00005 | 2 | 0 | 1 | 1 | Not Selected | 0 | Room_Type 1 | 48 | 2018 | 4 | 11 | Online | 0 | 0 | 0 | 94.50000 | 0 | Canceled |
# lets check the tail of the dataset
hotel.tail()
| Booking_ID | no_of_adults | no_of_children | no_of_weekend_nights | no_of_week_nights | type_of_meal_plan | required_car_parking_space | room_type_reserved | lead_time | arrival_year | arrival_month | arrival_date | market_segment_type | repeated_guest | no_of_previous_cancellations | no_of_previous_bookings_not_canceled | avg_price_per_room | no_of_special_requests | booking_status | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 36270 | INN36271 | 3 | 0 | 2 | 6 | Meal Plan 1 | 0 | Room_Type 4 | 85 | 2018 | 8 | 3 | Online | 0 | 0 | 0 | 167.80000 | 1 | Not_Canceled |
| 36271 | INN36272 | 2 | 0 | 1 | 3 | Meal Plan 1 | 0 | Room_Type 1 | 228 | 2018 | 10 | 17 | Online | 0 | 0 | 0 | 90.95000 | 2 | Canceled |
| 36272 | INN36273 | 2 | 0 | 2 | 6 | Meal Plan 1 | 0 | Room_Type 1 | 148 | 2018 | 7 | 1 | Online | 0 | 0 | 0 | 98.39000 | 2 | Not_Canceled |
| 36273 | INN36274 | 2 | 0 | 0 | 3 | Not Selected | 0 | Room_Type 1 | 63 | 2018 | 4 | 21 | Online | 0 | 0 | 0 | 94.50000 | 0 | Canceled |
| 36274 | INN36275 | 2 | 0 | 1 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 207 | 2018 | 12 | 30 | Offline | 0 | 0 | 0 | 161.67000 | 0 | Not_Canceled |
# lets check the shape of the dataset
hotel.shape
print("Insights")
print("Dataset has", hotel.shape[0], "rows")
print("Dataset has", hotel.shape[1], "columns")
Insights Dataset has 36275 rows Dataset has 19 columns
# lets review the dataset columns
hotel.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 36275 entries, 0 to 36274 Data columns (total 19 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Booking_ID 36275 non-null object 1 no_of_adults 36275 non-null int64 2 no_of_children 36275 non-null int64 3 no_of_weekend_nights 36275 non-null int64 4 no_of_week_nights 36275 non-null int64 5 type_of_meal_plan 36275 non-null object 6 required_car_parking_space 36275 non-null int64 7 room_type_reserved 36275 non-null object 8 lead_time 36275 non-null int64 9 arrival_year 36275 non-null int64 10 arrival_month 36275 non-null int64 11 arrival_date 36275 non-null int64 12 market_segment_type 36275 non-null object 13 repeated_guest 36275 non-null int64 14 no_of_previous_cancellations 36275 non-null int64 15 no_of_previous_bookings_not_canceled 36275 non-null int64 16 avg_price_per_room 36275 non-null float64 17 no_of_special_requests 36275 non-null int64 18 booking_status 36275 non-null object dtypes: float64(1), int64(13), object(5) memory usage: 5.3+ MB
# convert the object datatypes to category
col_list = hotel.select_dtypes(["object"]).columns
for i in col_list:
hotel[i] = hotel[i].astype("category")
hotel.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 36275 entries, 0 to 36274 Data columns (total 19 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Booking_ID 36275 non-null category 1 no_of_adults 36275 non-null int64 2 no_of_children 36275 non-null int64 3 no_of_weekend_nights 36275 non-null int64 4 no_of_week_nights 36275 non-null int64 5 type_of_meal_plan 36275 non-null category 6 required_car_parking_space 36275 non-null int64 7 room_type_reserved 36275 non-null category 8 lead_time 36275 non-null int64 9 arrival_year 36275 non-null int64 10 arrival_month 36275 non-null int64 11 arrival_date 36275 non-null int64 12 market_segment_type 36275 non-null category 13 repeated_guest 36275 non-null int64 14 no_of_previous_cancellations 36275 non-null int64 15 no_of_previous_bookings_not_canceled 36275 non-null int64 16 avg_price_per_room 36275 non-null float64 17 no_of_special_requests 36275 non-null int64 18 booking_status 36275 non-null category dtypes: category(5), float64(1), int64(13) memory usage: 5.4 MB
Dataset has 5 categorical columns.
Dataset has 14 numerical columns.
# lets check the missing data/null values in the dataset
hotel.isnull().sum()
Booking_ID 0 no_of_adults 0 no_of_children 0 no_of_weekend_nights 0 no_of_week_nights 0 type_of_meal_plan 0 required_car_parking_space 0 room_type_reserved 0 lead_time 0 arrival_year 0 arrival_month 0 arrival_date 0 market_segment_type 0 repeated_guest 0 no_of_previous_cancellations 0 no_of_previous_bookings_not_canceled 0 avg_price_per_room 0 no_of_special_requests 0 booking_status 0 dtype: int64
Dataset has no null values
# lets check the statical summary of the data using describe()
hotel.describe(include="all").T
| count | unique | top | freq | mean | std | min | 25% | 50% | 75% | max | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Booking_ID | 36275 | 36275 | INN00001 | 1 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| no_of_adults | 36275.00000 | NaN | NaN | NaN | 1.84496 | 0.51871 | 0.00000 | 2.00000 | 2.00000 | 2.00000 | 4.00000 |
| no_of_children | 36275.00000 | NaN | NaN | NaN | 0.10528 | 0.40265 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 10.00000 |
| no_of_weekend_nights | 36275.00000 | NaN | NaN | NaN | 0.81072 | 0.87064 | 0.00000 | 0.00000 | 1.00000 | 2.00000 | 7.00000 |
| no_of_week_nights | 36275.00000 | NaN | NaN | NaN | 2.20430 | 1.41090 | 0.00000 | 1.00000 | 2.00000 | 3.00000 | 17.00000 |
| type_of_meal_plan | 36275 | 4 | Meal Plan 1 | 27835 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| required_car_parking_space | 36275.00000 | NaN | NaN | NaN | 0.03099 | 0.17328 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 1.00000 |
| room_type_reserved | 36275 | 7 | Room_Type 1 | 28130 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| lead_time | 36275.00000 | NaN | NaN | NaN | 85.23256 | 85.93082 | 0.00000 | 17.00000 | 57.00000 | 126.00000 | 443.00000 |
| arrival_year | 36275.00000 | NaN | NaN | NaN | 2017.82043 | 0.38384 | 2017.00000 | 2018.00000 | 2018.00000 | 2018.00000 | 2018.00000 |
| arrival_month | 36275.00000 | NaN | NaN | NaN | 7.42365 | 3.06989 | 1.00000 | 5.00000 | 8.00000 | 10.00000 | 12.00000 |
| arrival_date | 36275.00000 | NaN | NaN | NaN | 15.59700 | 8.74045 | 1.00000 | 8.00000 | 16.00000 | 23.00000 | 31.00000 |
| market_segment_type | 36275 | 5 | Online | 23214 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| repeated_guest | 36275.00000 | NaN | NaN | NaN | 0.02564 | 0.15805 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 1.00000 |
| no_of_previous_cancellations | 36275.00000 | NaN | NaN | NaN | 0.02335 | 0.36833 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 13.00000 |
| no_of_previous_bookings_not_canceled | 36275.00000 | NaN | NaN | NaN | 0.15341 | 1.75417 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 58.00000 |
| avg_price_per_room | 36275.00000 | NaN | NaN | NaN | 103.42354 | 35.08942 | 0.00000 | 80.30000 | 99.45000 | 120.00000 | 540.00000 |
| no_of_special_requests | 36275.00000 | NaN | NaN | NaN | 0.61966 | 0.78624 | 0.00000 | 0.00000 | 0.00000 | 1.00000 | 5.00000 |
| booking_status | 36275 | 2 | Not_Canceled | 24390 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
Quanititative Data:
Max no.of Adults booked is 2.0 and 50% is 4.0
Max weekend nights are 7.0
no of week nights median is around 2.0 and max is 17.0
Type of meal plan is categorical and has 4 types
Lead time max is 443 min and 50% is around 57 mins.
Arrival year max is 2018 and Arrival month has max booking in Dec. and 50% of them in August.
max no. of previous cancellations are 13 for some bookings.
Booking Status is a categorical column and has caneled and not_canceled status.
# lets check if we have any duplicates
hotel.duplicated().sum()
0
No duplicate rows are in the dataset provided.
# lets check the unique vaues in booking_id columns.
hotel["Booking_ID"].nunique()
36275
hoteldata has 36275 unique rows.
# lets drop the booking Id column
hotel.drop(["Booking_ID"], axis=1, inplace=True)
Leading Questions:
# function to create labeled barplots
def labeled_barplot(data, feature, perc=False, n=None):
"""
Barplot with percentage at the top
data: dataframe
feature: dataframe column
perc: whether to display percentages instead of count (default is False)
n: displays the top n category levels (default is None, i.e., display all levels)
"""
total = len(data[feature]) # length of the column
count = data[feature].nunique()
if n is None:
plt.figure(figsize=(count + 2, 6))
else:
plt.figure(figsize=(n + 2, 6))
plt.xticks(rotation=90, fontsize=15)
ax = sns.countplot(
data=data,
x=feature,
palette="Paired",
order=data[feature].value_counts().index[:n],
)
for p in ax.patches:
if perc == True:
label = "{:.1f}%".format(
100 * p.get_height() / total
) # percentage of each class of the category
else:
label = p.get_height() # count of each level of the category
x = p.get_x() + p.get_width() / 2 # width of the plot
y = p.get_height() # height of the plot
ax.annotate(
label,
(x, y),
ha="center",
va="center",
size=12,
xytext=(0, 5),
textcoords="offset points",
) # annotate the percentage
plt.show() # show the plot
# function to plot stacked bar chart
def stacked_barplot(data, predictor, target):
"""
Print the category counts and plot a stacked bar chart
data: dataframe
predictor: independent variable
target: target variable
"""
count = data[predictor].nunique()
sorter = data[target].value_counts().index[-1]
tab1 = pd.crosstab(data[predictor], data[target], margins=True).sort_values(
by=sorter, ascending=False
)
print(tab1)
print("-" * 120)
tab = pd.crosstab(data[predictor], data[target], normalize="index").sort_values(
by=sorter, ascending=False
)
tab.plot(kind="bar", stacked=True, figsize=(count + 5, 6))
plt.legend(
loc="lower left", frameon=False,
)
plt.legend(loc="upper left", bbox_to_anchor=(1, 1))
plt.show()
# lets define function to draw both the histogram and boxplot
def histogram_boxplot(data, feature, figsize=(15, 10), kde=False, bins=None):
"""
Boxplot and histogram combined
data: dataframe
feature: dataframe column
figsize: size of figure (default (15,10))
kde: whether to show the density curve (default False)
bins: number of bins for histogram (default None)
"""
f2, (ax_box2, ax_hist2) = plt.subplots(
nrows=2, # Number of rows of the subplot grid= 2
sharex=True, # x-axis will be shared among all subplots
gridspec_kw={"height_ratios": (0.25, 0.75)},
figsize=figsize,
) # creating the 2 subplots
sns.boxplot(
data=data, x=feature, ax=ax_box2, showmeans=True, color="violet"
) # boxplot will be created and a triangle will indicate the mean value of the column
sns.histplot(
data=data, x=feature, kde=kde, ax=ax_hist2, bins=bins
) if bins else sns.histplot(
data=data, x=feature, kde=kde, ax=ax_hist2
) # For histogram
ax_hist2.axvline(
data[feature].mean(), color="red", linestyle="--"
) # Add mean to the histogram
ax_hist2.axvline(
data[feature].median(), color="yellow", linestyle="-"
) # Add median to the histogram
labeled_barplot(hotel, "booking_status", perc=True)
Booking that are not canceled are of 67.2 percent.
Canceled are almost 33 percent.
# lets see the unique values in no_of_adults
np.sort(hotel["no_of_adults"].unique())
array([0, 1, 2, 3, 4], dtype=int64)
histogram_boxplot(hotel, "no_of_adults")
No of Adults doesnt have outliers.
Around 25000 bookings have no of adults 2.0
# lets check the no of children unique values
np.sort(hotel["no_of_children"].unique())
array([ 0, 1, 2, 3, 9, 10], dtype=int64)
histogram_boxplot(hotel, "no_of_children")
More than 30000 bookings have no of children between 0 and 1.
there are few booking with 2 children.
np.sort(hotel["no_of_weekend_nights"].unique())
array([0, 1, 2, 3, 4, 5, 6, 7], dtype=int64)
histogram_boxplot(hotel, "no_of_weekend_nights")
labeled_barplot(hotel, "no_of_weekend_nights", perc=True)
46.5% of the bookings have 0 weekend nights.
np.sort(hotel["no_of_week_nights"].unique())
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17], dtype=int64)
histogram_boxplot(hotel, "no_of_week_nights")
labeled_barplot(hotel, "no_of_week_nights", perc=True)
Insights:
31.5% of the bookings have no of week nights as 2.
hotel["type_of_meal_plan"].unique()
['Meal Plan 1', 'Not Selected', 'Meal Plan 2', 'Meal Plan 3'] Categories (4, object): ['Meal Plan 1', 'Meal Plan 2', 'Meal Plan 3', 'Not Selected']
labeled_barplot(hotel, "type_of_meal_plan", perc=True)
Insights:
Around 77 % of the bookings have type of meal plan as Meal Plan 1 ie., Breakfast.
Most of the bookings are tending towards the breakfast plan.
hotel["required_car_parking_space"].unique()
array([0, 1], dtype=int64)
histogram_boxplot(hotel, "required_car_parking_space")
labeled_barplot(hotel, "required_car_parking_space", perc=True)
Almost 97 % of the bookings have no car parking space requirement.
Mostly the hotel might be in the public transport connected area or the hotel might have parking that is available and free to everyone. Special car parking space is not required for the visitors.
hotel["room_type_reserved"].unique()
['Room_Type 1', 'Room_Type 4', 'Room_Type 2', 'Room_Type 6', 'Room_Type 5', 'Room_Type 7', 'Room_Type 3'] Categories (7, object): ['Room_Type 1', 'Room_Type 2', 'Room_Type 3', 'Room_Type 4', 'Room_Type 5', 'Room_Type 6', 'Room_Type 7']
labeled_barplot(hotel, "room_type_reserved", perc=True)
Insights:
Around 77 % of the bookings have Room type 1 reserved.
No booking has Room type 3.
np.sort(hotel["lead_time"].unique())
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12,
13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25,
26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38,
39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51,
52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64,
65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77,
78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90,
91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103,
104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116,
117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129,
130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142,
143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155,
156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168,
169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181,
182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194,
195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207,
208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220,
221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233,
234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246,
247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259,
260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272,
273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285,
286, 287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, 298,
299, 300, 301, 302, 303, 304, 305, 306, 307, 308, 309, 310, 311,
313, 314, 315, 317, 318, 319, 320, 322, 323, 324, 325, 326, 327,
328, 330, 331, 332, 333, 335, 336, 338, 341, 345, 346, 348, 349,
350, 351, 352, 353, 355, 359, 361, 372, 377, 381, 386, 418, 433,
443], dtype=int64)
histogram_boxplot(hotel, "lead_time")
Lead time distribution is highly right skewed distribution with most of them @ 0.
hotel["arrival_year"].unique()
array([2017, 2018], dtype=int64)
histogram_boxplot(hotel, "arrival_year")
More than 90% of the bookings have arrival year 2018
np.sort(hotel["arrival_month"].unique())
array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12], dtype=int64)
histogram_boxplot(hotel, "arrival_month")
labeled_barplot(hotel, "arrival_month", perc=True)
Oct month has maximum number of bookings with more than 5000 and around 15%.
The busiest month in the year for the hotel INN is in October, the month 10.
np.sort(hotel["arrival_date"].unique())
array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,
18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31],
dtype=int64)
histogram_boxplot(hotel, "arrival_date")
labeled_barplot(hotel, "arrival_date", perc=True)
Arrival Date ranges widely and almost uniformly distribution for the bookings.
hotel["market_segment_type"].unique()
['Offline', 'Online', 'Corporate', 'Aviation', 'Complementary'] Categories (5, object): ['Aviation', 'Complementary', 'Corporate', 'Offline', 'Online']
labeled_barplot(hotel, "market_segment_type", perc=True)
Insights:
Market segment type online contributes to around 65% of the bookings.
The next market segement type is offline. (may be in person or on call agent bookings).
hotel["repeated_guest"].unique()
array([0, 1], dtype=int64)
labeled_barplot(hotel, "repeated_guest", perc=True)
Insights:
Around 98% of the bookings are not repeated.
Only 2.6 % of the bookings are repeated.
histogram_boxplot(hotel, "no_of_previous_cancellations")
All the bookings looks to be new customers and no no of previous cancellations.
np.sort(hotel["no_of_previous_bookings_not_canceled"].unique())
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50,
51, 52, 53, 54, 55, 56, 57, 58], dtype=int64)
histogram_boxplot(hotel, "no_of_previous_bookings_not_canceled")
For almost all the bookings the no of previous bookings not canceled was 0.
np.sort(hotel["avg_price_per_room"].unique())
array([0.000e+00, 5.000e-01, 1.000e+00, ..., 3.650e+02, 3.755e+02,
5.400e+02])
histogram_boxplot(hotel, "avg_price_per_room")
Average Prive per room shows a almost normal distribution but with a long tail.
hoteldata[hotel["avg_price_per_room"] == 0]
| no_of_adults | no_of_children | no_of_weekend_nights | no_of_week_nights | type_of_meal_plan | required_car_parking_space | room_type_reserved | lead_time | arrival_year | arrival_month | arrival_date | market_segment_type | repeated_guest | no_of_previous_cancellations | no_of_previous_bookings_not_canceled | avg_price_per_room | no_of_special_requests | booking_status | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 63 | 1 | 0 | 0 | 1 | Meal Plan 1 | 0 | Room_Type 1 | 2 | 2017 | 9 | 10 | Complementary | 0 | 0 | 0 | 20.75000 | 1 | 0 |
| 145 | 1 | 0 | 0 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 13 | 2018 | 6 | 1 | Complementary | 1 | 3 | 5 | 20.75000 | 1 | 0 |
| 209 | 1 | 0 | 0 | 0 | Meal Plan 1 | 0 | Room_Type 1 | 4 | 2018 | 2 | 27 | Complementary | 0 | 0 | 0 | 20.75000 | 1 | 0 |
| 266 | 1 | 0 | 0 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 1 | 2017 | 8 | 12 | Complementary | 1 | 0 | 1 | 20.75000 | 1 | 0 |
| 267 | 1 | 0 | 2 | 1 | Meal Plan 1 | 0 | Room_Type 1 | 4 | 2017 | 8 | 23 | Complementary | 0 | 0 | 0 | 20.75000 | 1 | 0 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 35983 | 1 | 0 | 0 | 1 | Meal Plan 1 | 0 | Room_Type 7 | 0 | 2018 | 6 | 7 | Complementary | 1 | 4 | 17 | 20.75000 | 1 | 0 |
| 36080 | 1 | 0 | 1 | 1 | Meal Plan 1 | 0 | Room_Type 7 | 0 | 2018 | 3 | 21 | Complementary | 1 | 3 | 15 | 20.75000 | 1 | 0 |
| 36114 | 1 | 0 | 0 | 1 | Meal Plan 1 | 0 | Room_Type 1 | 1 | 2018 | 3 | 2 | Online | 0 | 0 | 0 | 20.75000 | 0 | 0 |
| 36217 | 2 | 0 | 2 | 1 | Meal Plan 1 | 0 | Room_Type 2 | 3 | 2017 | 8 | 9 | Online | 0 | 0 | 0 | 20.75000 | 2 | 0 |
| 36250 | 1 | 0 | 0 | 2 | Meal Plan 2 | 0 | Room_Type 1 | 6 | 2017 | 12 | 10 | Online | 0 | 0 | 0 | 20.75000 | 0 | 0 |
545 rows × 18 columns
hotel.loc[hotel["avg_price_per_room"] == 0, "market_segment_type"].value_counts()
Complementary 354 Online 191 Aviation 0 Corporate 0 Offline 0 Name: market_segment_type, dtype: int64
# Calculating the 25th quantile
Q1 = hotel["avg_price_per_room"].quantile(0.25)
# Calculating the 75th quantile
Q3 = hotel["avg_price_per_room"].quantile(0.75)
# Calculating IQR
IQR = Q3 - Q1
# Calculating value of upper whisker
Upper_Whisker = Q3 + 1.5 * IQR
Upper_Whisker
179.55
hotel.loc[hotel["avg_price_per_room"] >= 179]
| no_of_adults | no_of_children | no_of_weekend_nights | no_of_week_nights | type_of_meal_plan | required_car_parking_space | room_type_reserved | lead_time | arrival_year | arrival_month | arrival_date | market_segment_type | repeated_guest | no_of_previous_cancellations | no_of_previous_bookings_not_canceled | avg_price_per_room | no_of_special_requests | booking_status | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 60 | 2 | 2 | 0 | 1 | Meal Plan 1 | 1 | Room_Type 6 | 2 | 2018 | 9 | 2 | Online | 0 | 0 | 0 | 258.00000 | 1 | Not_Canceled |
| 114 | 2 | 2 | 2 | 2 | Meal Plan 1 | 0 | Room_Type 2 | 116 | 2018 | 6 | 26 | Online | 0 | 0 | 0 | 184.24000 | 1 | Canceled |
| 127 | 2 | 2 | 0 | 2 | Meal Plan 1 | 0 | Room_Type 6 | 110 | 2018 | 10 | 14 | Online | 0 | 0 | 0 | 190.80000 | 0 | Canceled |
| 162 | 3 | 0 | 2 | 2 | Meal Plan 1 | 0 | Room_Type 7 | 3 | 2018 | 10 | 7 | Online | 0 | 0 | 0 | 215.60000 | 1 | Not_Canceled |
| 227 | 1 | 2 | 2 | 2 | Meal Plan 1 | 0 | Room_Type 6 | 4 | 2017 | 9 | 19 | Online | 0 | 0 | 0 | 200.75000 | 2 | Not_Canceled |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 36133 | 3 | 0 | 0 | 1 | Meal Plan 1 | 0 | Room_Type 4 | 0 | 2018 | 8 | 23 | Online | 0 | 0 | 0 | 225.00000 | 2 | Not_Canceled |
| 36172 | 1 | 0 | 0 | 2 | Meal Plan 1 | 0 | Room_Type 4 | 1 | 2018 | 9 | 27 | Online | 0 | 0 | 0 | 205.00000 | 1 | Not_Canceled |
| 36221 | 2 | 2 | 2 | 4 | Meal Plan 1 | 0 | Room_Type 6 | 62 | 2018 | 9 | 24 | Online | 0 | 0 | 0 | 207.90000 | 1 | Not_Canceled |
| 36227 | 2 | 2 | 0 | 1 | Meal Plan 1 | 0 | Room_Type 6 | 20 | 2018 | 8 | 6 | Online | 0 | 0 | 0 | 231.00000 | 1 | Not_Canceled |
| 36269 | 2 | 2 | 0 | 1 | Meal Plan 1 | 0 | Room_Type 6 | 0 | 2018 | 10 | 6 | Online | 0 | 0 | 0 | 216.00000 | 0 | Canceled |
1089 rows × 18 columns
hotel.loc[hotel["avg_price_per_room"] >= 200, "avg_price_per_room"] = Upper_Whisker
histogram_boxplot(hotel, "avg_price_per_room")
# Calculating the 25th quantile
Q1 = hotel["avg_price_per_room"].quantile(0.25)
# Calculating the 75th quantile
Q3 = hotel["avg_price_per_room"].quantile(0.75)
# Calculating IQR
IQR = Q3 - Q1
# Calculating value of upper whisker
Lower_Whisker = Q1 - 1.5 * IQR
Lower_Whisker
20.749999999999993
hotel.loc[hotel["avg_price_per_room"] <= 25, "avg_price_per_room"] = Lower_Whisker
histogram_boxplot(hotel, "avg_price_per_room")
Average price per room shows a almost normal distribution now.
np.sort(hotel["no_of_special_requests"].unique())
array([0, 1, 2, 3, 4, 5], dtype=int64)
histogram_boxplot(hotel, "no_of_special_requests")
labeled_barplot(hotel, "no_of_special_requests", perc=True)
Around 30% of the bookings have a special requests.
hotel["booking_status"].unique()
['Not_Canceled', 'Canceled'] Categories (2, object): ['Canceled', 'Not_Canceled']
labeled_barplot(hotel, "booking_status", perc=True)
hotel["booking_status"] = hotel["booking_status"].apply(
lambda x: 1 if x == "Canceled" else 0
)
labeled_barplot(hotel, "booking_status", perc=True)
Around 67% of the bookings are not canceled.
But 30% of the bookings are canceled which is still more.
plt.figure(figsize=(12, 7))
sns.countplot(data=hotel, x="arrival_month", hue="booking_status")
plt.show()
October has the highest number of bookings and also highest number of cancellations.
Month of January has least number of cancellations.
# grouping the data on arrival months and extracting the count of bookings
monthly_data = hotel.groupby(["arrival_month"])["booking_status"].count()
# creating a dataframe with months and count of customers in each month
monthly_data = pd.DataFrame(
{"Month": list(monthly_data.index), "Guests": list(monthly_data.values)}
)
# plotting the trend over different months
plt.figure(figsize=(13, 7))
sns.lineplot(data=monthly_data, x="Month", y="Guests")
plt.show()
# lets do a heatmap to understand the corelation between numerical columns
cols_list = hotel.select_dtypes(include=np.number).columns.tolist()
plt.figure(figsize=(12, 7))
sns.heatmap(
hotel[cols_list].corr(), annot=True, vmin=-1, vmax=1, fmt=".2f", cmap="Spectral"
)
plt.show()
Repeated guest and no of previous booking not canceled are having good corelation.
No of previous cancellations and no of previous bookings not cancelled are also having correlation.
plt.figure(figsize=(20, 10))
sns.pairplot(
data=hotel,
vars=hotel.select_dtypes(include=np.number).columns.tolist(),
corner=True,
)
plt.show()
<Figure size 1440x720 with 0 Axes>
No conclusions can be drawn from the pairplot
### function to plot distributions wrt target
def distribution_plot_wrt_target(data, predictor, target):
fig, axs = plt.subplots(2, 2, figsize=(12, 10))
target_uniq = data[target].unique()
axs[0, 0].set_title("Distribution of target for target=" + str(target_uniq[0]))
sns.histplot(
data=data[data[target] == target_uniq[0]],
x=predictor,
kde=True,
ax=axs[0, 0],
color="teal",
stat="density",
)
axs[0, 1].set_title("Distribution of target for target=" + str(target_uniq[1]))
sns.histplot(
data=data[data[target] == target_uniq[1]],
x=predictor,
kde=True,
ax=axs[0, 1],
color="orange",
stat="density",
)
axs[1, 0].set_title("Boxplot w.r.t target")
sns.boxplot(data=data, x=target, y=predictor, ax=axs[1, 0], palette="gist_rainbow")
axs[1, 1].set_title("Boxplot (without outliers) w.r.t target")
sns.boxplot(
data=data,
x=target,
y=predictor,
ax=axs[1, 1],
showfliers=False,
palette="gist_rainbow",
)
plt.tight_layout()
plt.show()
def stacked_barplot(data, predictor, target):
"""
Print the category counts and plot a stacked bar chart
data: dataframe
predictor: independent variable
target: target variable
"""
count = data[predictor].nunique()
sorter = data[target].value_counts().index[-1]
tab1 = pd.crosstab(data[predictor], data[target], margins=True).sort_values(
by=sorter, ascending=False
)
print(tab1)
print("-" * 120)
tab = pd.crosstab(data[predictor], data[target], normalize="index").sort_values(
by=sorter, ascending=False
)
tab.plot(kind="bar", stacked=True, figsize=(count + 5, 5))
plt.legend(
loc="lower left", frameon=False,
)
plt.legend(loc="upper left", bbox_to_anchor=(1, 1))
plt.show()
plt.figure(figsize=(12, 7))
sns.countplot(data=hotel, x="market_segment_type", hue="booking_status")
plt.show()
market segment online has the maximum guests.
And online market segment has the maximum cancellations.
Aviation and Complementary have almost 0 cancellations.
plt.figure(figsize=(10, 6))
sns.boxplot(
data=hotel, x="market_segment_type", y="avg_price_per_room", palette="gist_rainbow"
)
plt.show()
Online median prices are higher than the other market segment type.
And for market segment type online has a range higher for online.
stacked_barplot(hotel, "no_of_special_requests", "booking_status")
booking_status 1 0 All no_of_special_requests All 11885 24390 36275 0 8545 11232 19777 1 2703 8670 11373 2 637 3727 4364 3 0 675 675 4 0 78 78 5 0 8 8 ------------------------------------------------------------------------------------------------------------------------
Cancellations are more when there is no special request. When there are special requests there is no cancellation of the booking.
plt.figure(figsize=(12, 7))
sns.countplot(data=hotel, x="no_of_special_requests", hue="booking_status")
plt.show()
Cancellations are more when there is no special requests.
plt.figure(figsize=(10, 5))
sns.boxplot(data=hotel, x="no_of_special_requests", y="avg_price_per_room")
plt.show()
Average price per room median is more when there is more no of special requests.
distribution_plot_wrt_target(data, "avg_price_per_room", "booking_status")
Insights:
distribution_plot_wrt_target(data, "lead_time", "booking_status")
# lets add the no of adults and children to make family data
familydata = hotel[(hoteldata["no_of_adults"]) > 0 & (hoteldata["no_of_children"] > 1)]
familydata
| no_of_adults | no_of_children | no_of_weekend_nights | no_of_week_nights | type_of_meal_plan | required_car_parking_space | room_type_reserved | lead_time | arrival_year | arrival_month | arrival_date | market_segment_type | repeated_guest | no_of_previous_cancellations | no_of_previous_bookings_not_canceled | avg_price_per_room | no_of_special_requests | booking_status | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2 | 0 | 1 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 224 | 2017 | 10 | 2 | Offline | 0 | 0 | 0 | 65.00000 | 0 | 0 |
| 1 | 2 | 0 | 2 | 3 | Not Selected | 0 | Room_Type 1 | 5 | 2018 | 11 | 6 | Online | 0 | 0 | 0 | 106.68000 | 1 | 0 |
| 2 | 1 | 0 | 2 | 1 | Meal Plan 1 | 0 | Room_Type 1 | 1 | 2018 | 2 | 28 | Online | 0 | 0 | 0 | 60.00000 | 0 | 1 |
| 3 | 2 | 0 | 0 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 211 | 2018 | 5 | 20 | Online | 0 | 0 | 0 | 100.00000 | 0 | 1 |
| 4 | 2 | 0 | 1 | 1 | Not Selected | 0 | Room_Type 1 | 48 | 2018 | 4 | 11 | Online | 0 | 0 | 0 | 94.50000 | 0 | 1 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 36270 | 3 | 0 | 2 | 6 | Meal Plan 1 | 0 | Room_Type 4 | 85 | 2018 | 8 | 3 | Online | 0 | 0 | 0 | 167.80000 | 1 | 0 |
| 36271 | 2 | 0 | 1 | 3 | Meal Plan 1 | 0 | Room_Type 1 | 228 | 2018 | 10 | 17 | Online | 0 | 0 | 0 | 90.95000 | 2 | 1 |
| 36272 | 2 | 0 | 2 | 6 | Meal Plan 1 | 0 | Room_Type 1 | 148 | 2018 | 7 | 1 | Online | 0 | 0 | 0 | 98.39000 | 2 | 0 |
| 36273 | 2 | 0 | 0 | 3 | Not Selected | 0 | Room_Type 1 | 63 | 2018 | 4 | 21 | Online | 0 | 0 | 0 | 94.50000 | 0 | 1 |
| 36274 | 2 | 0 | 1 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 207 | 2018 | 12 | 30 | Offline | 0 | 0 | 0 | 161.67000 | 0 | 0 |
36136 rows × 18 columns
# lets add a new column to show the no of family members
familydata["familymembers"] = familydata["no_of_adults"] + familydata["no_of_children"]
# lets see if the column has been added or not
familydata
| no_of_adults | no_of_children | no_of_weekend_nights | no_of_week_nights | type_of_meal_plan | required_car_parking_space | room_type_reserved | lead_time | arrival_year | arrival_month | arrival_date | market_segment_type | repeated_guest | no_of_previous_cancellations | no_of_previous_bookings_not_canceled | avg_price_per_room | no_of_special_requests | booking_status | familymembers | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2 | 0 | 1 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 224 | 2017 | 10 | 2 | Offline | 0 | 0 | 0 | 65.00000 | 0 | 0 | 2 |
| 1 | 2 | 0 | 2 | 3 | Not Selected | 0 | Room_Type 1 | 5 | 2018 | 11 | 6 | Online | 0 | 0 | 0 | 106.68000 | 1 | 0 | 2 |
| 2 | 1 | 0 | 2 | 1 | Meal Plan 1 | 0 | Room_Type 1 | 1 | 2018 | 2 | 28 | Online | 0 | 0 | 0 | 60.00000 | 0 | 1 | 1 |
| 3 | 2 | 0 | 0 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 211 | 2018 | 5 | 20 | Online | 0 | 0 | 0 | 100.00000 | 0 | 1 | 2 |
| 4 | 2 | 0 | 1 | 1 | Not Selected | 0 | Room_Type 1 | 48 | 2018 | 4 | 11 | Online | 0 | 0 | 0 | 94.50000 | 0 | 1 | 2 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 36270 | 3 | 0 | 2 | 6 | Meal Plan 1 | 0 | Room_Type 4 | 85 | 2018 | 8 | 3 | Online | 0 | 0 | 0 | 167.80000 | 1 | 0 | 3 |
| 36271 | 2 | 0 | 1 | 3 | Meal Plan 1 | 0 | Room_Type 1 | 228 | 2018 | 10 | 17 | Online | 0 | 0 | 0 | 90.95000 | 2 | 1 | 2 |
| 36272 | 2 | 0 | 2 | 6 | Meal Plan 1 | 0 | Room_Type 1 | 148 | 2018 | 7 | 1 | Online | 0 | 0 | 0 | 98.39000 | 2 | 0 | 2 |
| 36273 | 2 | 0 | 0 | 3 | Not Selected | 0 | Room_Type 1 | 63 | 2018 | 4 | 21 | Online | 0 | 0 | 0 | 94.50000 | 0 | 1 | 2 |
| 36274 | 2 | 0 | 1 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 207 | 2018 | 12 | 30 | Offline | 0 | 0 | 0 | 161.67000 | 0 | 0 | 2 |
36136 rows × 19 columns
plt.figure(figsize=(12, 7))
sns.countplot(data=familydata, x="familymembers", hue="booking_status")
plt.show()
Family members with 2 have more bookings and also cancellations.
plt.figure(figsize=(8, 6))
sns.countplot(data=hotel, x="repeated_guest", hue="booking_status")
plt.show()
More than 20000 repeated guests bookings are not cancelled.
hotel["booking_status"] = hotel["booking_status"].astype(int)
Outlier detection is done below.
# outlier detection using boxplot
numeric_columns = hotel.select_dtypes(include=np.number).columns.tolist()
# dropping booking_status
numeric_columns.remove("booking_status")
plt.figure(figsize=(15, 12))
for i, variable in enumerate(numeric_columns):
plt.subplot(4, 4, i + 1)
plt.boxplot(hoteldata[variable], whis=1.5)
plt.tight_layout()
plt.title(variable)
plt.show()
We want to predict which bookings will be canceled.
Before we proceed to build a model, we'll have to encode categorical features.
We'll split the data into train and test to be able to evaluate the model that we build on the train data.
X = hotel.drop("booking_status", axis=1)
Y = hotel["booking_status"]
# creating dummy variables
X = pd.get_dummies(X, drop_first=True)
# splitting in training and test set
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.3, random_state=1)
print("Shape of Training set : ", X_train.shape)
print("Shape of test set : ", X_test.shape)
print("Percentage of classes in training set:")
print(y_train.value_counts(normalize=True))
print("Percentage of classes in test set:")
print(y_test.value_counts(normalize=True))
Shape of Training set : (25392, 27) Shape of test set : (10883, 27) Percentage of classes in training set: 0 0.67064 1 0.32936 Name: booking_status, dtype: float64 Percentage of classes in test set: 0 0.67638 1 0.32362 Name: booking_status, dtype: float64
Both the cases are important as:
If we predict that a booking will not be canceled and the booking gets canceled then the hotel will lose resources and will have to bear additional costs of distribution channels.
If we predict that a booking will get canceled and the booking doesn't get canceled the hotel might not be able to provide satisfactory services to the customer by assuming that this booking will be canceled. This might damage the brand equity.
F1 Score to be maximized, greater the F1 score higher are the chances of minimizing False Negatives and False Positives. # defining a function to compute different metrics to check performance of a classification model built using statsmodels
def model_performance_classification_statsmodels(
model, predictors, target, threshold=0.5
):
"""
Function to compute different metrics to check classification model performance
model: classifier
predictors: independent variables
target: dependent variable
threshold: threshold for classifying the observation as class 1
"""
# checking which probabilities are greater than threshold
pred_temp = model.predict(predictors) > threshold
# rounding off the above values to get classes
pred = np.round(pred_temp)
acc = accuracy_score(target, pred) # to compute Accuracy
recall = recall_score(target, pred) # to compute Recall
precision = precision_score(target, pred) # to compute Precision
f1 = f1_score(target, pred) # to compute F1-score
# creating a dataframe of metrics
df_perf = pd.DataFrame(
{"Accuracy": acc, "Recall": recall, "Precision": precision, "F1": f1,},
index=[0],
)
return df_perf
# defining a function to plot the confusion_matrix of a classification model
def confusion_matrix_statsmodels(model, predictors, target, threshold=0.5):
"""
To plot the confusion_matrix with percentages
model: classifier
predictors: independent variables
target: dependent variable
threshold: threshold for classifying the observation as class 1
"""
y_pred = model.predict(predictors) > threshold
cm = confusion_matrix(target, y_pred)
labels = np.asarray(
[
["{0:0.0f}".format(item) + "\n{0:.2%}".format(item / cm.flatten().sum())]
for item in cm.flatten()
]
).reshape(2, 2)
plt.figure(figsize=(6, 4))
sns.heatmap(cm, annot=labels, fmt="")
plt.ylabel("True label")
plt.xlabel("Predicted label")
X = hotel.drop("booking_status", axis=1)
Y = hotel["booking_status"]
# creating dummy variables
X = pd.get_dummies(X, drop_first=True)
# adding constant
X = sm.add_constant(X)
# splitting in training and test set
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.3, random_state=1)
logit = sm.Logit(y_train, X_train.astype(float))
lg = logit.fit(
disp=False
) # setting disp=False will remove the information on number of iterations
print(lg.summary())
Logit Regression Results
==============================================================================
Dep. Variable: booking_status No. Observations: 25392
Model: Logit Df Residuals: 25364
Method: MLE Df Model: 27
Date: Fri, 08 Jul 2022 Pseudo R-squ.: 0.3281
Time: 22:46:15 Log-Likelihood: -10812.
converged: False LL-Null: -16091.
Covariance Type: nonrobust LLR p-value: 0.000
========================================================================================================
coef std err z P>|z| [0.025 0.975]
--------------------------------------------------------------------------------------------------------
const -940.1618 120.672 -7.791 0.000 -1176.674 -703.649
no_of_adults 0.1159 0.038 3.083 0.002 0.042 0.190
no_of_children 0.1706 0.057 3.008 0.003 0.059 0.282
no_of_weekend_nights 0.1060 0.020 5.365 0.000 0.067 0.145
no_of_week_nights 0.0412 0.012 3.357 0.001 0.017 0.065
required_car_parking_space -1.5839 0.138 -11.493 0.000 -1.854 -1.314
lead_time 0.0156 0.000 58.760 0.000 0.015 0.016
arrival_year 0.4646 0.060 7.770 0.000 0.347 0.582
arrival_month -0.0407 0.006 -6.301 0.000 -0.053 -0.028
arrival_date 0.0007 0.002 0.349 0.727 -0.003 0.004
repeated_guest -2.3405 0.615 -3.808 0.000 -3.545 -1.136
no_of_previous_cancellations 0.2676 0.086 3.126 0.002 0.100 0.435
no_of_previous_bookings_not_canceled -0.1728 0.153 -1.132 0.258 -0.472 0.126
avg_price_per_room 0.0194 0.001 24.978 0.000 0.018 0.021
no_of_special_requests -1.4680 0.030 -48.828 0.000 -1.527 -1.409
type_of_meal_plan_Meal Plan 2 0.2084 0.066 3.152 0.002 0.079 0.338
type_of_meal_plan_Meal Plan 3 35.2663 2.54e+07 1.39e-06 1.000 -4.97e+07 4.97e+07
type_of_meal_plan_Not Selected 0.2834 0.053 5.343 0.000 0.179 0.387
room_type_reserved_Room_Type 2 -0.3469 0.130 -2.661 0.008 -0.602 -0.091
room_type_reserved_Room_Type 3 -0.0004 1.308 -0.000 1.000 -2.564 2.563
room_type_reserved_Room_Type 4 -0.2894 0.053 -5.430 0.000 -0.394 -0.185
room_type_reserved_Room_Type 5 -0.6952 0.207 -3.363 0.001 -1.100 -0.290
room_type_reserved_Room_Type 6 -0.7425 0.144 -5.160 0.000 -1.025 -0.460
room_type_reserved_Room_Type 7 -0.7936 0.279 -2.847 0.004 -1.340 -0.247
market_segment_type_Complementary -54.1885 2.54e+07 -2.14e-06 1.000 -4.97e+07 4.97e+07
market_segment_type_Corporate -1.1907 0.266 -4.475 0.000 -1.712 -0.669
market_segment_type_Offline -2.1858 0.255 -8.586 0.000 -2.685 -1.687
market_segment_type_Online -0.4037 0.251 -1.607 0.108 -0.896 0.089
========================================================================================================
Negative values of the coefficient shows that probability of booking being a cancelled decreases with the increase of corresponding attribute value.
Positive values of the coefficient show that that probability of booking being a cancelled increases with the increase of corresponding attribute value.
p-value of a variable indicates if the variable is significant or not. If we consider the significance level to be 0.05 (5%), then any variable with a p-value less than 0.05 would be considered significant.
But these variables might contain multicollinearity, which will affect the p-values.
We will have to remove multicollinearity from the data to get reliable coefficients and p-values.
There are different ways of detecting (or testing) multi-collinearity, one such way is the Variation Inflation Factor.
Variance Inflation factor: Variance inflation factors measure the inflation in the variances of the regression coefficients estimates due to collinearity that exist among the predictors. It is a measure of how much the variance of the estimated regression coefficient βk is "inflated" by the existence of correlation among the predictor variables in the model.
General Rule of thumb: If VIF is 1 then there is no correlation among the kth predictor and the remaining predictor variables, and hence the variance of β̂k is not inflated at all. Whereas if VIF exceeds 5, we say there is moderate VIF and if it is 10 or exceeding 10, it shows signs of high multi-collinearity. But the purpose of the analysis should dictate which threshold to use.
vif_series = pd.Series(
[variance_inflation_factor(X_train.values, i) for i in range(X_train.shape[1])],
index=X_train.columns,
dtype=float,
)
print("Series before feature selection: \n\n{}\n".format(vif_series))
Series before feature selection: const 39518512.80380 no_of_adults 1.34940 no_of_children 1.97448 no_of_weekend_nights 1.07051 no_of_week_nights 1.09675 required_car_parking_space 1.03973 lead_time 1.39344 arrival_year 1.43269 arrival_month 1.27733 arrival_date 1.00673 repeated_guest 1.78408 no_of_previous_cancellations 1.39569 no_of_previous_bookings_not_canceled 1.65202 avg_price_per_room 1.97468 no_of_special_requests 1.24753 type_of_meal_plan_Meal Plan 2 1.27064 type_of_meal_plan_Meal Plan 3 1.02513 type_of_meal_plan_Not Selected 1.27677 room_type_reserved_Room_Type 2 1.10216 room_type_reserved_Room_Type 3 1.00330 room_type_reserved_Room_Type 4 1.37157 room_type_reserved_Room_Type 5 1.02824 room_type_reserved_Room_Type 6 1.94031 room_type_reserved_Room_Type 7 1.09482 market_segment_type_Complementary 4.44776 market_segment_type_Corporate 16.92973 market_segment_type_Offline 64.11699 market_segment_type_Online 71.18185 dtype: float64
Let's remove the insignificant features (p-value>0.05).
Arrival date, Meal type plan 3, room_type_reserved_Room_Type 3 and market segment type complementary,no_of_previous_bookings_not_canceled and market type segment online have p-value > 0.05. So, they are not significant and we'll drop them.
But sometimes p-values change after dropping a variable. So, we'll not drop all variables at once.
Instead, we will do the following repeatedly using a loop:
The above process can also be done manually by picking one variable at a time that has a high p-value, dropping it, and building a model again. But that might be a little tedious and using a loop will be more efficient.
# initial list of columns
cols = X_train.columns.tolist()
# setting an initial max p-value
max_p_value = 1
while len(cols) > 0:
# defining the train set
x_train_aux = X_train[cols]
# fitting the model
model = sm.Logit(y_train, x_train_aux).fit(disp=False)
# getting the p-values and the maximum p-value
p_values = model.pvalues
max_p_value = max(p_values)
# name of the variable with maximum p-value
feature_with_p_max = p_values.idxmax()
if max_p_value > 0.05:
cols.remove(feature_with_p_max)
else:
break
selected_features = cols
print(selected_features)
['const', 'no_of_adults', 'no_of_children', 'no_of_weekend_nights', 'no_of_week_nights', 'required_car_parking_space', 'lead_time', 'arrival_year', 'arrival_month', 'repeated_guest', 'no_of_previous_cancellations', 'avg_price_per_room', 'no_of_special_requests', 'type_of_meal_plan_Meal Plan 2', 'type_of_meal_plan_Not Selected', 'room_type_reserved_Room_Type 2', 'room_type_reserved_Room_Type 4', 'room_type_reserved_Room_Type 5', 'room_type_reserved_Room_Type 6', 'room_type_reserved_Room_Type 7', 'market_segment_type_Corporate', 'market_segment_type_Offline']
X_train1 = X_train[selected_features]
X_test1 = X_test[selected_features]
logit1 = sm.Logit(y_train, X_train1)
lg1 = logit1.fit(disp=False)
print(lg1.summary())
Logit Regression Results
==============================================================================
Dep. Variable: booking_status No. Observations: 25392
Model: Logit Df Residuals: 25370
Method: MLE Df Model: 21
Date: Fri, 08 Jul 2022 Pseudo R-squ.: 0.3269
Time: 22:46:33 Log-Likelihood: -10831.
converged: True LL-Null: -16091.
Covariance Type: nonrobust LLR p-value: 0.000
==================================================================================================
coef std err z P>|z| [0.025 0.975]
--------------------------------------------------------------------------------------------------
const -932.8355 120.294 -7.755 0.000 -1168.608 -697.063
no_of_adults 0.1114 0.037 2.993 0.003 0.038 0.184
no_of_children 0.1670 0.057 2.946 0.003 0.056 0.278
no_of_weekend_nights 0.1083 0.020 5.489 0.000 0.070 0.147
no_of_week_nights 0.0434 0.012 3.538 0.000 0.019 0.067
required_car_parking_space -1.5846 0.138 -11.495 0.000 -1.855 -1.314
lead_time 0.0156 0.000 59.137 0.000 0.015 0.016
arrival_year 0.4608 0.060 7.729 0.000 0.344 0.578
arrival_month -0.0417 0.006 -6.471 0.000 -0.054 -0.029
repeated_guest -2.7363 0.555 -4.934 0.000 -3.823 -1.649
no_of_previous_cancellations 0.2305 0.077 3.010 0.003 0.080 0.381
avg_price_per_room 0.0198 0.001 25.954 0.000 0.018 0.021
no_of_special_requests -1.4687 0.030 -48.925 0.000 -1.528 -1.410
type_of_meal_plan_Meal Plan 2 0.1971 0.066 2.984 0.003 0.068 0.326
type_of_meal_plan_Not Selected 0.2932 0.053 5.545 0.000 0.190 0.397
room_type_reserved_Room_Type 2 -0.3413 0.130 -2.621 0.009 -0.596 -0.086
room_type_reserved_Room_Type 4 -0.2902 0.053 -5.461 0.000 -0.394 -0.186
room_type_reserved_Room_Type 5 -0.7151 0.206 -3.476 0.001 -1.118 -0.312
room_type_reserved_Room_Type 6 -0.7594 0.144 -5.282 0.000 -1.041 -0.478
room_type_reserved_Room_Type 7 -0.8216 0.278 -2.958 0.003 -1.366 -0.277
market_segment_type_Corporate -0.7819 0.103 -7.603 0.000 -0.983 -0.580
market_segment_type_Offline -1.7706 0.052 -34.121 0.000 -1.872 -1.669
==================================================================================================
The coefficients of the logistic regression model are in terms of log(odd), to find the odds we have to take the exponential of the coefficients. Therefore, odds = exp(b) The percentage change in odds is given as odds = (exp(b) - 1) * 100
# converting coefficients to odds
odds = np.exp(lg1.params)
# finding the percentage change
perc_change_odds = (np.exp(lg1.params) - 1) * 100
# removing limit from number of columns to display
pd.set_option("display.max_columns", None)
# adding the odds to a dataframe
pd.DataFrame({"Odds": odds, "Change_odd%": perc_change_odds}, index=X_train1.columns).T
| const | no_of_adults | no_of_children | no_of_weekend_nights | no_of_week_nights | required_car_parking_space | lead_time | arrival_year | arrival_month | repeated_guest | no_of_previous_cancellations | avg_price_per_room | no_of_special_requests | type_of_meal_plan_Meal Plan 2 | type_of_meal_plan_Not Selected | room_type_reserved_Room_Type 2 | room_type_reserved_Room_Type 4 | room_type_reserved_Room_Type 5 | room_type_reserved_Room_Type 6 | room_type_reserved_Room_Type 7 | market_segment_type_Corporate | market_segment_type_Offline | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Odds | 0.00000 | 1.11789 | 1.18170 | 1.11434 | 1.04437 | 0.20502 | 1.01576 | 1.58534 | 0.95917 | 0.06481 | 1.25926 | 1.02000 | 0.23022 | 1.21782 | 1.34066 | 0.71086 | 0.74814 | 0.48915 | 0.46794 | 0.43972 | 0.45755 | 0.17023 |
| Change_odd% | -100.00000 | 11.78937 | 18.17030 | 11.43441 | 4.43660 | -79.49786 | 1.57587 | 58.53419 | -4.08255 | -93.51901 | 25.92632 | 1.99964 | -76.97791 | 21.78209 | 34.06613 | -28.91402 | -25.18572 | -51.08503 | -53.20575 | -56.02761 | -54.24526 | -82.97661 |
# creating confusion matrix
confusion_matrix_statsmodels(lg1, X_train1, y_train)
log_reg_model_train_perf = model_performance_classification_statsmodels(
lg1, X_train1, y_train
)
print("Training performance:")
log_reg_model_train_perf
Training performance:
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.80565 | 0.63219 | 0.73985 | 0.68180 |
logit_roc_auc_train = roc_auc_score(y_train, lg1.predict(X_train1))
fpr, tpr, thresholds = roc_curve(y_train, lg1.predict(X_train1))
plt.figure(figsize=(7, 5))
plt.plot(fpr, tpr, label="Logistic Regression (area = %0.2f)" % logit_roc_auc_train)
plt.plot([0, 1], [0, 1], "r--")
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel("False Positive Rate")
plt.ylabel("True Positive Rate")
plt.title("Receiver operating characteristic")
plt.legend(loc="lower right")
plt.show()
# Optimal threshold as per AUC-ROC curve
# The optimal cut off would be where tpr is high and fpr is low
fpr, tpr, thresholds = roc_curve(y_train, lg1.predict(X_train1))
optimal_idx = np.argmax(tpr - fpr)
optimal_threshold_auc_roc = thresholds[optimal_idx]
print(optimal_threshold_auc_roc)
0.3622407033272864
# creating confusion matrix
confusion_matrix_statsmodels(
lg1, X_train1, y_train, threshold=optimal_threshold_auc_roc
)
# checking model performance for this model
log_reg_model_train_perf_threshold_auc_roc = model_performance_classification_statsmodels(
lg1, X_train1, y_train, threshold=optimal_threshold_auc_roc
)
print("Training performance:")
log_reg_model_train_perf_threshold_auc_roc
Training performance:
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.79005 | 0.73897 | 0.66252 | 0.69866 |
y_scores = lg1.predict(X_train1)
prec, rec, tre = precision_recall_curve(y_train, y_scores,)
def plot_prec_recall_vs_tresh(precisions, recalls, thresholds):
plt.plot(thresholds, precisions[:-1], "b--", label="precision")
plt.plot(thresholds, recalls[:-1], "g--", label="recall")
plt.xlabel("Threshold")
plt.legend(loc="upper left")
plt.ylim([0, 1])
plt.figure(figsize=(10, 7))
plot_prec_recall_vs_tresh(prec, rec, tre)
plt.show()
# setting the threshold
optimal_threshold_curve = 0.42
# creating confusion matrix
confusion_matrix_statsmodels(
lg1, X_train1, y_train, threshold=optimal_threshold_auc_roc
)
log_reg_model_train_perf_threshold_curve = model_performance_classification_statsmodels(
lg1, X_train1, y_train, threshold=optimal_threshold_curve
)
print("Training performance:")
log_reg_model_train_perf_threshold_curve
Training performance:
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.80002 | 0.69568 | 0.69668 | 0.69618 |
# training performance comparison
models_train_comp_df = pd.concat(
[
log_reg_model_train_perf.T,
log_reg_model_train_perf_threshold_auc_roc.T,
log_reg_model_train_perf_threshold_curve.T,
],
axis=1,
)
models_train_comp_df.columns = [
"Logistic Regression statsmodel",
"Logistic Regression-0.36 Threshold",
"Logistic Regression-0.42 Threshold",
]
print("Training performance comparison:")
models_train_comp_df
Training performance comparison:
| Logistic Regression statsmodel | Logistic Regression-0.36 Threshold | Logistic Regression-0.42 Threshold | |
|---|---|---|---|
| Accuracy | 0.80565 | 0.79005 | 0.80002 |
| Recall | 0.63219 | 0.73897 | 0.69568 |
| Precision | 0.73985 | 0.66252 | 0.69668 |
| F1 | 0.68180 | 0.69866 | 0.69618 |
Model is better with 0.36 threshold
f1 score isbetter with 0.36 threshold
# creating confusion matrix
confusion_matrix_statsmodels(lg1, X_test1, y_test)
log_reg_model_test_perf = model_performance_classification_statsmodels(
lg1, X_test1, y_test
)
print("Test performance:")
log_reg_model_test_perf
Test performance:
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.80355 | 0.63032 | 0.72644 | 0.67498 |
logit_roc_auc_train = roc_auc_score(y_test, lg1.predict(X_test1))
fpr, tpr, thresholds = roc_curve(y_test, lg1.predict(X_test1))
plt.figure(figsize=(7, 5))
plt.plot(fpr, tpr, label="Logistic Regression (area = %0.2f)" % logit_roc_auc_train)
plt.plot([0, 1], [0, 1], "r--")
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel("False Positive Rate")
plt.ylabel("True Positive Rate")
plt.title("Receiver operating characteristic")
plt.legend(loc="lower right")
plt.show()
# creating confusion matrix
confusion_matrix_statsmodels(lg1, X_test1, y_test, threshold=optimal_threshold_auc_roc)
# checking model performance for this model
log_reg_model_test_perf_threshold_auc_roc = model_performance_classification_statsmodels(
lg1, X_test1, y_test, threshold=optimal_threshold_auc_roc
)
print("Test performance:")
log_reg_model_test_perf_threshold_auc_roc
Test performance:
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.79371 | 0.74390 | 0.66112 | 0.70007 |
# creating confusion matrix
confusion_matrix_statsmodels(lg1, X_test1, y_test, threshold=optimal_threshold_curve)
log_reg_model_test_perf_threshold_curve = model_performance_classification_statsmodels(
lg1, X_test1, y_test, threshold=optimal_threshold_curve
)
print("Test performance:")
log_reg_model_test_perf_threshold_curve
Test performance:
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.80300 | 0.70187 | 0.69321 | 0.69752 |
# training performance comparison
models_train_comp_df = pd.concat(
[
log_reg_model_train_perf.T,
log_reg_model_train_perf_threshold_auc_roc.T,
log_reg_model_train_perf_threshold_curve.T,
],
axis=1,
)
models_train_comp_df.columns = [
"Logistic Regression statsmodel",
"Logistic Regression-0.37 Threshold",
"Logistic Regression-0.42 Threshold",
]
print("Training performance comparison:")
models_train_comp_df
Training performance comparison:
| Logistic Regression statsmodel | Logistic Regression-0.37 Threshold | Logistic Regression-0.42 Threshold | |
|---|---|---|---|
| Accuracy | 0.80565 | 0.79005 | 0.80002 |
| Recall | 0.63219 | 0.73897 | 0.69568 |
| Precision | 0.73985 | 0.66252 | 0.69668 |
| F1 | 0.68180 | 0.69866 | 0.69618 |
# testing performance comparison
models_test_comp_df = pd.concat(
[
log_reg_model_test_perf.T,
log_reg_model_test_perf_threshold_auc_roc.T,
log_reg_model_test_perf_threshold_curve.T,
],
axis=1,
)
models_test_comp_df.columns = [
"Logistic Regression statsmodel",
"Logistic Regression-0.37 Threshold",
"Logistic Regression-0.42 Threshold",
]
print("Test set performance comparison:")
models_test_comp_df
Test set performance comparison:
| Logistic Regression statsmodel | Logistic Regression-0.37 Threshold | Logistic Regression-0.42 Threshold | |
|---|---|---|---|
| Accuracy | 0.80355 | 0.79371 | 0.80300 |
| Recall | 0.63032 | 0.74390 | 0.70187 |
| Precision | 0.72644 | 0.66112 | 0.69321 |
| F1 | 0.67498 | 0.70007 | 0.69752 |
X = hotel.drop("booking_status", axis=1)
Y = hotel["booking_status"]
# creating dummy variables
X = pd.get_dummies(X, drop_first=True)
# splitting in training and test set
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.3, random_state=1)
print("Number of rows in train data =", X_train.shape[0])
print("Number of rows in test data =", X_test.shape[0])
Number of rows in train data = 25392 Number of rows in test data = 10883
print("Percentage of classes in training set:")
print(y_train.value_counts(normalize=True))
print("Percentage of classes in test set:")
print(y_test.value_counts(normalize=True))
Percentage of classes in training set: 0 0.67064 1 0.32936 Name: booking_status, dtype: float64 Percentage of classes in test set: 0 0.67638 1 0.32362 Name: booking_status, dtype: float64
model = DecisionTreeClassifier(criterion="gini", random_state=1)
model.fit(X_train, y_train)
DecisionTreeClassifier(random_state=1)
Predicting a customer will not be behind on payments (Non-Delinquent) but in reality the customer would be behind on payments.
Predicting a customer will be behind on payments (Delinquent) but in reality the customer would not be behind on payments (Non-Delinquent).
recall should be maximized, the greater the recall higher the chances of minimizing the false negatives.# defining a function to compute different metrics to check performance of a classification model built using sklearn
def model_performance_classification_sklearn(model, predictors, target):
"""
Function to compute different metrics to check classification model performance
model: classifier
predictors: independent variables
target: dependent variable
"""
# predicting using the independent variables
pred = model.predict(predictors)
acc = accuracy_score(target, pred) # to compute Accuracy
recall = recall_score(target, pred) # to compute Recall
precision = precision_score(target, pred) # to compute Precision
f1 = f1_score(target, pred) # to compute F1-score
# creating a dataframe of metrics
df_perf = pd.DataFrame(
{"Accuracy": acc, "Recall": recall, "Precision": precision, "F1": f1,},
index=[0],
)
return df_perf
def confusion_matrix_sklearn(model, predictors, target):
"""
To plot the confusion_matrix with percentages
model: classifier
predictors: independent variables
target: dependent variable
"""
y_pred = model.predict(predictors)
cm = confusion_matrix(target, y_pred)
labels = np.asarray(
[
["{0:0.0f}".format(item) + "\n{0:.2%}".format(item / cm.flatten().sum())]
for item in cm.flatten()
]
).reshape(2, 2)
plt.figure(figsize=(6, 4))
sns.heatmap(cm, annot=labels, fmt="")
plt.ylabel("True label")
plt.xlabel("Predicted label")
decision_tree_perf_train = model_performance_classification_sklearn(
model, X_train, y_train
)
decision_tree_perf_train
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.99421 | 0.98661 | 0.99578 | 0.99117 |
confusion_matrix_sklearn(model, X_train, y_train)
decision_tree_perf_test = model_performance_classification_sklearn(
model, X_test, y_test
)
decision_tree_perf_test
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.87062 | 0.80892 | 0.79492 | 0.80186 |
confusion_matrix_sklearn(model, X_test, y_test)
Before pruning the tree let's check the important features.
# lets get all the columns name
column_names = list(X.columns)
feature_names = column_names
print(feature_names)
['no_of_adults', 'no_of_children', 'no_of_weekend_nights', 'no_of_week_nights', 'required_car_parking_space', 'lead_time', 'arrival_year', 'arrival_month', 'arrival_date', 'repeated_guest', 'no_of_previous_cancellations', 'no_of_previous_bookings_not_canceled', 'avg_price_per_room', 'no_of_special_requests', 'type_of_meal_plan_Meal Plan 2', 'type_of_meal_plan_Meal Plan 3', 'type_of_meal_plan_Not Selected', 'room_type_reserved_Room_Type 2', 'room_type_reserved_Room_Type 3', 'room_type_reserved_Room_Type 4', 'room_type_reserved_Room_Type 5', 'room_type_reserved_Room_Type 6', 'room_type_reserved_Room_Type 7', 'market_segment_type_Complementary', 'market_segment_type_Corporate', 'market_segment_type_Offline', 'market_segment_type_Online']
#lets visualize the decision tree with the feature names
plt.figure(figsize=(20, 30))
out = tree.plot_tree(
model,
feature_names=feature_names,
filled=True,
fontsize=9,
node_ids=True,
class_names=True,
)
for o in out:
arrow = o.arrow_patch
if arrow is not None:
arrow.set_edgecolor("black")
arrow.set_linewidth(1)
plt.show()
# Text report showing the rules of a decision tree -
print(tree.export_text(model, feature_names=feature_names, show_weights=True))
|--- lead_time <= 151.50 | |--- no_of_special_requests <= 0.50 | | |--- market_segment_type_Online <= 0.50 | | | |--- lead_time <= 90.50 | | | | |--- no_of_weekend_nights <= 0.50 | | | | | |--- avg_price_per_room <= 179.47 | | | | | | |--- market_segment_type_Offline <= 0.50 | | | | | | | |--- lead_time <= 16.50 | | | | | | | | |--- room_type_reserved_Room_Type 4 <= 0.50 | | | | | | | | | |--- repeated_guest <= 0.50 | | | | | | | | | | |--- lead_time <= 11.50 | | | | | | | | | | | |--- truncated branch of depth 13 | | | | | | | | | | |--- lead_time > 11.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | |--- repeated_guest > 0.50 | | | | | | | | | | |--- weights: [147.00, 0.00] class: 0 | | | | | | | | |--- room_type_reserved_Room_Type 4 > 0.50 | | | | | | | | | |--- arrival_date <= 29.50 | | | | | | | | | | |--- no_of_adults <= 1.50 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | | | | |--- no_of_adults > 1.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- arrival_date > 29.50 | | | | | | | | | | |--- no_of_adults <= 1.50 | | | | | | | | | | | |--- weights: [0.00, 3.00] class: 1 | | | | | | | | | | |--- no_of_adults > 1.50 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | |--- lead_time > 16.50 | | | | | | | | |--- avg_price_per_room <= 135.00 | | | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | | | |--- lead_time <= 17.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- lead_time > 17.50 | | | | | | | | | | | |--- truncated branch of depth 12 | | | | | | | | | |--- arrival_month > 11.50 | | | | | | | | | | |--- weights: [29.00, 0.00] class: 0 | | | | | | | | |--- avg_price_per_room > 135.00 | | | | | | | | | |--- weights: [0.00, 8.00] class: 1 | | | | | | |--- market_segment_type_Offline > 0.50 | | | | | | | |--- weights: [1606.00, 0.00] class: 0 | | | | | |--- avg_price_per_room > 179.47 | | | | | | |--- avg_price_per_room <= 179.78 | | | | | | | |--- arrival_date <= 29.00 | | | | | | | | |--- weights: [0.00, 17.00] class: 1 | | | | | | | |--- arrival_date > 29.00 | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | |--- avg_price_per_room > 179.78 | | | | | | | |--- weights: [3.00, 0.00] class: 0 | | | | |--- no_of_weekend_nights > 0.50 | | | | | |--- lead_time <= 68.50 | | | | | | |--- no_of_weekend_nights <= 4.50 | | | | | | | |--- lead_time <= 1.50 | | | | | | | | |--- arrival_date <= 27.50 | | | | | | | | | |--- no_of_week_nights <= 5.50 | | | | | | | | | | |--- type_of_meal_plan_Meal Plan 2 <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | | |--- type_of_meal_plan_Meal Plan 2 > 0.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- no_of_week_nights > 5.50 | | | | | | | | | | |--- market_segment_type_Complementary <= 0.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | | |--- market_segment_type_Complementary > 0.50 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | |--- arrival_date > 27.50 | | | | | | | | | |--- lead_time <= 0.50 | | | | | | | | | | |--- no_of_week_nights <= 2.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- no_of_week_nights > 2.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | |--- lead_time > 0.50 | | | | | | | | | | |--- arrival_date <= 28.50 | | | | | | | | | | | |--- weights: [0.00, 33.00] class: 1 | | | | | | | | | | |--- arrival_date > 28.50 | | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | |--- lead_time > 1.50 | | | | | | | | |--- arrival_month <= 9.50 | | | | | | | | | |--- lead_time <= 59.50 | | | | | | | | | | |--- arrival_month <= 7.50 | | | | | | | | | | | |--- truncated branch of depth 16 | | | | | | | | | | |--- arrival_month > 7.50 | | | | | | | | | | | |--- truncated branch of depth 12 | | | | | | | | | |--- lead_time > 59.50 | | | | | | | | | | |--- arrival_date <= 19.50 | | | | | | | | | | | |--- weights: [22.00, 0.00] class: 0 | | | | | | | | | | |--- arrival_date > 19.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | |--- arrival_month > 9.50 | | | | | | | | | |--- lead_time <= 65.50 | | | | | | | | | | |--- arrival_date <= 21.50 | | | | | | | | | | | |--- truncated branch of depth 11 | | | | | | | | | | |--- arrival_date > 21.50 | | | | | | | | | | | |--- weights: [127.00, 0.00] class: 0 | | | | | | | | | |--- lead_time > 65.50 | | | | | | | | | | |--- no_of_week_nights <= 3.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- no_of_week_nights > 3.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | |--- no_of_weekend_nights > 4.50 | | | | | | | |--- weights: [0.00, 8.00] class: 1 | | | | | |--- lead_time > 68.50 | | | | | | |--- avg_price_per_room <= 99.98 | | | | | | | |--- arrival_month <= 3.50 | | | | | | | | |--- avg_price_per_room <= 62.50 | | | | | | | | | |--- weights: [21.00, 0.00] class: 0 | | | | | | | | |--- avg_price_per_room > 62.50 | | | | | | | | | |--- lead_time <= 77.00 | | | | | | | | | | |--- lead_time <= 72.00 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | | |--- lead_time > 72.00 | | | | | | | | | | | |--- weights: [0.00, 9.00] class: 1 | | | | | | | | | |--- lead_time > 77.00 | | | | | | | | | | |--- no_of_weekend_nights <= 1.50 | | | | | | | | | | | |--- weights: [9.00, 0.00] class: 0 | | | | | | | | | | |--- no_of_weekend_nights > 1.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | |--- arrival_month > 3.50 | | | | | | | | |--- lead_time <= 71.50 | | | | | | | | | |--- no_of_week_nights <= 3.50 | | | | | | | | | | |--- weights: [3.00, 0.00] class: 0 | | | | | | | | | |--- no_of_week_nights > 3.50 | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | | |--- lead_time > 71.50 | | | | | | | | | |--- no_of_week_nights <= 2.50 | | | | | | | | | | |--- lead_time <= 88.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- lead_time > 88.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | |--- no_of_week_nights > 2.50 | | | | | | | | | | |--- lead_time <= 73.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | | |--- lead_time > 73.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | |--- avg_price_per_room > 99.98 | | | | | | | |--- no_of_adults <= 1.50 | | | | | | | | |--- arrival_date <= 17.00 | | | | | | | | | |--- weights: [0.00, 52.00] class: 1 | | | | | | | | |--- arrival_date > 17.00 | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | |--- no_of_adults > 1.50 | | | | | | | | |--- avg_price_per_room <= 105.20 | | | | | | | | | |--- arrival_date <= 22.00 | | | | | | | | | | |--- lead_time <= 75.00 | | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | | | | |--- lead_time > 75.00 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- arrival_date > 22.00 | | | | | | | | | | |--- lead_time <= 78.50 | | | | | | | | | | | |--- weights: [0.00, 22.00] class: 1 | | | | | | | | | | |--- lead_time > 78.50 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | |--- avg_price_per_room > 105.20 | | | | | | | | | |--- lead_time <= 88.50 | | | | | | | | | | |--- arrival_date <= 3.00 | | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | | | | |--- arrival_date > 3.00 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | |--- lead_time > 88.50 | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | |--- lead_time > 90.50 | | | | |--- lead_time <= 117.50 | | | | | |--- avg_price_per_room <= 93.58 | | | | | | |--- arrival_date <= 6.50 | | | | | | | |--- no_of_week_nights <= 2.50 | | | | | | | | |--- no_of_weekend_nights <= 1.50 | | | | | | | | | |--- avg_price_per_room <= 80.38 | | | | | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | |--- avg_price_per_room > 80.38 | | | | | | | | | | |--- room_type_reserved_Room_Type 4 <= 0.50 | | | | | | | | | | | |--- weights: [3.00, 0.00] class: 0 | | | | | | | | | | |--- room_type_reserved_Room_Type 4 > 0.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | |--- no_of_weekend_nights > 1.50 | | | | | | | | | |--- no_of_adults <= 2.50 | | | | | | | | | | |--- weights: [5.00, 0.00] class: 0 | | | | | | | | | |--- no_of_adults > 2.50 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | |--- no_of_week_nights > 2.50 | | | | | | | | |--- arrival_date <= 5.50 | | | | | | | | | |--- weights: [35.00, 0.00] class: 0 | | | | | | | | |--- arrival_date > 5.50 | | | | | | | | | |--- arrival_month <= 9.00 | | | | | | | | | | |--- weights: [3.00, 0.00] class: 0 | | | | | | | | | |--- arrival_month > 9.00 | | | | | | | | | | |--- no_of_week_nights <= 3.50 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | | |--- no_of_week_nights > 3.50 | | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | |--- arrival_date > 6.50 | | | | | | | |--- avg_price_per_room <= 66.50 | | | | | | | | |--- no_of_weekend_nights <= 1.50 | | | | | | | | | |--- arrival_date <= 9.50 | | | | | | | | | | |--- avg_price_per_room <= 62.25 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | | |--- avg_price_per_room > 62.25 | | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | | | |--- arrival_date > 9.50 | | | | | | | | | | |--- weights: [24.00, 0.00] class: 0 | | | | | | | | |--- no_of_weekend_nights > 1.50 | | | | | | | | | |--- avg_price_per_room <= 58.75 | | | | | | | | | | |--- weights: [6.00, 0.00] class: 0 | | | | | | | | | |--- avg_price_per_room > 58.75 | | | | | | | | | | |--- lead_time <= 97.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- lead_time > 97.50 | | | | | | | | | | | |--- weights: [0.00, 39.00] class: 1 | | | | | | | |--- avg_price_per_room > 66.50 | | | | | | | | |--- type_of_meal_plan_Meal Plan 2 <= 0.50 | | | | | | | | | |--- arrival_date <= 29.50 | | | | | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 11 | | | | | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | |--- arrival_date > 29.50 | | | | | | | | | | |--- lead_time <= 96.00 | | | | | | | | | | | |--- weights: [8.00, 0.00] class: 0 | | | | | | | | | | |--- lead_time > 96.00 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | |--- type_of_meal_plan_Meal Plan 2 > 0.50 | | | | | | | | | |--- avg_price_per_room <= 82.50 | | | | | | | | | | |--- no_of_weekend_nights <= 1.00 | | | | | | | | | | | |--- weights: [0.00, 7.00] class: 1 | | | | | | | | | | |--- no_of_weekend_nights > 1.00 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | |--- avg_price_per_room > 82.50 | | | | | | | | | | |--- arrival_year <= 2017.50 | | | | | | | | | | | |--- weights: [11.00, 2.00] class: 0 | | | | | | | | | | |--- arrival_year > 2017.50 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | |--- avg_price_per_room > 93.58 | | | | | | |--- arrival_date <= 16.50 | | | | | | | |--- arrival_month <= 7.50 | | | | | | | | |--- lead_time <= 108.50 | | | | | | | | | |--- avg_price_per_room <= 125.00 | | | | | | | | | | |--- lead_time <= 107.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- lead_time > 107.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | |--- avg_price_per_room > 125.00 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | |--- lead_time > 108.50 | | | | | | | | | |--- no_of_week_nights <= 2.50 | | | | | | | | | | |--- no_of_weekend_nights <= 1.00 | | | | | | | | | | | |--- weights: [12.00, 1.00] class: 0 | | | | | | | | | | |--- no_of_weekend_nights > 1.00 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | |--- no_of_week_nights > 2.50 | | | | | | | | | | |--- weights: [5.00, 0.00] class: 0 | | | | | | | |--- arrival_month > 7.50 | | | | | | | | |--- avg_price_per_room <= 108.50 | | | | | | | | | |--- arrival_date <= 14.50 | | | | | | | | | | |--- lead_time <= 113.50 | | | | | | | | | | | |--- weights: [4.00, 0.00] class: 0 | | | | | | | | | | |--- lead_time > 113.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- arrival_date > 14.50 | | | | | | | | | | |--- weights: [0.00, 47.00] class: 1 | | | | | | | | |--- avg_price_per_room > 108.50 | | | | | | | | | |--- no_of_weekend_nights <= 0.50 | | | | | | | | | | |--- weights: [42.00, 0.00] class: 0 | | | | | | | | | |--- no_of_weekend_nights > 0.50 | | | | | | | | | | |--- arrival_date <= 9.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- arrival_date > 9.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | |--- arrival_date > 16.50 | | | | | | | |--- arrival_month <= 8.50 | | | | | | | | |--- avg_price_per_room <= 127.39 | | | | | | | | | |--- no_of_adults <= 1.50 | | | | | | | | | | |--- weights: [0.00, 50.00] class: 1 | | | | | | | | | |--- no_of_adults > 1.50 | | | | | | | | | | |--- no_of_week_nights <= 2.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- no_of_week_nights > 2.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | |--- avg_price_per_room > 127.39 | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | |--- arrival_month > 8.50 | | | | | | | | |--- arrival_year <= 2017.50 | | | | | | | | | |--- weights: [0.00, 3.00] class: 1 | | | | | | | | |--- arrival_year > 2017.50 | | | | | | | | | |--- avg_price_per_room <= 101.34 | | | | | | | | | | |--- weights: [0.00, 3.00] class: 1 | | | | | | | | | |--- avg_price_per_room > 101.34 | | | | | | | | | | |--- avg_price_per_room <= 165.11 | | | | | | | | | | | |--- weights: [8.00, 0.00] class: 0 | | | | | | | | | | |--- avg_price_per_room > 165.11 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | |--- lead_time > 117.50 | | | | | |--- no_of_week_nights <= 1.50 | | | | | | |--- arrival_date <= 7.50 | | | | | | | |--- weights: [51.00, 0.00] class: 0 | | | | | | |--- arrival_date > 7.50 | | | | | | | |--- avg_price_per_room <= 93.58 | | | | | | | | |--- avg_price_per_room <= 65.38 | | | | | | | | | |--- weights: [0.00, 3.00] class: 1 | | | | | | | | |--- avg_price_per_room > 65.38 | | | | | | | | | |--- avg_price_per_room <= 89.88 | | | | | | | | | | |--- weights: [24.00, 0.00] class: 0 | | | | | | | | | |--- avg_price_per_room > 89.88 | | | | | | | | | | |--- no_of_weekend_nights <= 0.50 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | | |--- no_of_weekend_nights > 0.50 | | | | | | | | | | | |--- weights: [8.00, 2.00] class: 0 | | | | | | | |--- avg_price_per_room > 93.58 | | | | | | | | |--- arrival_date <= 28.00 | | | | | | | | | |--- type_of_meal_plan_Meal Plan 2 <= 0.50 | | | | | | | | | | |--- no_of_adults <= 2.50 | | | | | | | | | | | |--- weights: [0.00, 17.00] class: 1 | | | | | | | | | | |--- no_of_adults > 2.50 | | | | | | | | | | | |--- weights: [1.00, 1.00] class: 0 | | | | | | | | | |--- type_of_meal_plan_Meal Plan 2 > 0.50 | | | | | | | | | | |--- arrival_month <= 7.00 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | | |--- arrival_month > 7.00 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | |--- arrival_date > 28.00 | | | | | | | | | |--- weights: [13.00, 1.00] class: 0 | | | | | |--- no_of_week_nights > 1.50 | | | | | | |--- no_of_adults <= 1.50 | | | | | | | |--- weights: [113.00, 0.00] class: 0 | | | | | | |--- no_of_adults > 1.50 | | | | | | | |--- lead_time <= 125.50 | | | | | | | | |--- avg_price_per_room <= 90.85 | | | | | | | | | |--- avg_price_per_room <= 87.50 | | | | | | | | | | |--- no_of_week_nights <= 3.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- no_of_week_nights > 3.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- avg_price_per_room > 87.50 | | | | | | | | | | |--- weights: [0.00, 10.00] class: 1 | | | | | | | | |--- avg_price_per_room > 90.85 | | | | | | | | | |--- weights: [14.00, 0.00] class: 0 | | | | | | | |--- lead_time > 125.50 | | | | | | | | |--- avg_price_per_room <= 155.78 | | | | | | | | | |--- arrival_date <= 19.50 | | | | | | | | | | |--- arrival_date <= 10.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- arrival_date > 10.50 | | | | | | | | | | | |--- truncated branch of depth 9 | | | | | | | | | |--- arrival_date > 19.50 | | | | | | | | | | |--- lead_time <= 128.00 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- lead_time > 128.00 | | | | | | | | | | | |--- weights: [75.00, 0.00] class: 0 | | | | | | | | |--- avg_price_per_room > 155.78 | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | |--- market_segment_type_Online > 0.50 | | | |--- lead_time <= 13.50 | | | | |--- avg_price_per_room <= 119.42 | | | | | |--- arrival_month <= 8.50 | | | | | | |--- arrival_month <= 1.50 | | | | | | | |--- weights: [128.00, 0.00] class: 0 | | | | | | |--- arrival_month > 1.50 | | | | | | | |--- lead_time <= 3.50 | | | | | | | | |--- no_of_weekend_nights <= 1.50 | | | | | | | | | |--- avg_price_per_room <= 106.50 | | | | | | | | | | |--- avg_price_per_room <= 74.57 | | | | | | | | | | | |--- weights: [37.00, 0.00] class: 0 | | | | | | | | | | |--- avg_price_per_room > 74.57 | | | | | | | | | | | |--- truncated branch of depth 16 | | | | | | | | | |--- avg_price_per_room > 106.50 | | | | | | | | | | |--- weights: [38.00, 0.00] class: 0 | | | | | | | | |--- no_of_weekend_nights > 1.50 | | | | | | | | | |--- avg_price_per_room <= 75.46 | | | | | | | | | | |--- no_of_week_nights <= 0.50 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | | |--- no_of_week_nights > 0.50 | | | | | | | | | | | |--- weights: [0.00, 12.00] class: 1 | | | | | | | | | |--- avg_price_per_room > 75.46 | | | | | | | | | | |--- lead_time <= 2.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- lead_time > 2.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | |--- lead_time > 3.50 | | | | | | | | |--- avg_price_per_room <= 99.38 | | | | | | | | | |--- arrival_date <= 3.50 | | | | | | | | | | |--- weights: [14.00, 0.00] class: 0 | | | | | | | | | |--- arrival_date > 3.50 | | | | | | | | | | |--- avg_price_per_room <= 68.38 | | | | | | | | | | | |--- weights: [9.00, 0.00] class: 0 | | | | | | | | | | |--- avg_price_per_room > 68.38 | | | | | | | | | | | |--- truncated branch of depth 14 | | | | | | | | |--- avg_price_per_room > 99.38 | | | | | | | | | |--- avg_price_per_room <= 117.25 | | | | | | | | | | |--- avg_price_per_room <= 101.67 | | | | | | | | | | | |--- weights: [0.00, 7.00] class: 1 | | | | | | | | | | |--- avg_price_per_room > 101.67 | | | | | | | | | | | |--- truncated branch of depth 7 | | | | | | | | | |--- avg_price_per_room > 117.25 | | | | | | | | | | |--- arrival_month <= 3.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- arrival_month > 3.50 | | | | | | | | | | | |--- weights: [11.00, 0.00] class: 0 | | | | | |--- arrival_month > 8.50 | | | | | | |--- no_of_week_nights <= 4.50 | | | | | | | |--- avg_price_per_room <= 117.56 | | | | | | | | |--- arrival_year <= 2017.50 | | | | | | | | | |--- weights: [148.00, 0.00] class: 0 | | | | | | | | |--- arrival_year > 2017.50 | | | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | | | |--- lead_time <= 8.50 | | | | | | | | | | | |--- truncated branch of depth 7 | | | | | | | | | | |--- lead_time > 8.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | |--- arrival_month > 11.50 | | | | | | | | | | |--- weights: [69.00, 0.00] class: 0 | | | | | | | |--- avg_price_per_room > 117.56 | | | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | | | |--- weights: [4.00, 0.00] class: 0 | | | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | | | |--- lead_time <= 10.00 | | | | | | | | | | |--- arrival_date <= 25.50 | | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | | | | |--- arrival_date > 25.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | |--- lead_time > 10.00 | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | |--- no_of_week_nights > 4.50 | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | |--- weights: [0.00, 5.00] class: 1 | | | | | | | |--- arrival_month > 11.50 | | | | | | | | |--- weights: [6.00, 0.00] class: 0 | | | | |--- avg_price_per_room > 119.42 | | | | | |--- lead_time <= 3.50 | | | | | | |--- avg_price_per_room <= 178.78 | | | | | | | |--- no_of_week_nights <= 4.50 | | | | | | | | |--- arrival_month <= 8.50 | | | | | | | | | |--- avg_price_per_room <= 134.50 | | | | | | | | | | |--- arrival_year <= 2017.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- arrival_year > 2017.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | |--- avg_price_per_room > 134.50 | | | | | | | | | | |--- avg_price_per_room <= 136.09 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- avg_price_per_room > 136.09 | | | | | | | | | | | |--- truncated branch of depth 10 | | | | | | | | |--- arrival_month > 8.50 | | | | | | | | | |--- avg_price_per_room <= 169.67 | | | | | | | | | | |--- arrival_date <= 5.00 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- arrival_date > 5.00 | | | | | | | | | | | |--- weights: [53.00, 0.00] class: 0 | | | | | | | | | |--- avg_price_per_room > 169.67 | | | | | | | | | | |--- arrival_date <= 11.00 | | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | | | | |--- arrival_date > 11.00 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | |--- no_of_week_nights > 4.50 | | | | | | | | |--- weights: [0.00, 3.00] class: 1 | | | | | | |--- avg_price_per_room > 178.78 | | | | | | | |--- avg_price_per_room <= 182.00 | | | | | | | | |--- arrival_year <= 2017.50 | | | | | | | | | |--- required_car_parking_space <= 0.50 | | | | | | | | | | |--- weights: [4.00, 0.00] class: 0 | | | | | | | | | |--- required_car_parking_space > 0.50 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | |--- arrival_year > 2017.50 | | | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | | | |--- arrival_month <= 3.50 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | | |--- arrival_month > 3.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | |--- arrival_month > 11.50 | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | |--- avg_price_per_room > 182.00 | | | | | | | | |--- weights: [13.00, 0.00] class: 0 | | | | | |--- lead_time > 3.50 | | | | | | |--- arrival_month <= 8.50 | | | | | | | |--- arrival_year <= 2017.50 | | | | | | | | |--- no_of_week_nights <= 0.50 | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | |--- no_of_week_nights > 0.50 | | | | | | | | | |--- weights: [4.00, 0.00] class: 0 | | | | | | | |--- arrival_year > 2017.50 | | | | | | | | |--- no_of_adults <= 2.50 | | | | | | | | | |--- avg_price_per_room <= 160.50 | | | | | | | | | | |--- arrival_month <= 6.50 | | | | | | | | | | | |--- truncated branch of depth 11 | | | | | | | | | | |--- arrival_month > 6.50 | | | | | | | | | | | |--- truncated branch of depth 10 | | | | | | | | | |--- avg_price_per_room > 160.50 | | | | | | | | | | |--- arrival_month <= 2.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- arrival_month > 2.50 | | | | | | | | | | | |--- weights: [0.00, 25.00] class: 1 | | | | | | | | |--- no_of_adults > 2.50 | | | | | | | | | |--- lead_time <= 5.50 | | | | | | | | | | |--- no_of_weekend_nights <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- no_of_weekend_nights > 0.50 | | | | | | | | | | | |--- weights: [4.00, 0.00] class: 0 | | | | | | | | | |--- lead_time > 5.50 | | | | | | | | | | |--- no_of_weekend_nights <= 1.50 | | | | | | | | | | | |--- truncated branch of depth 7 | | | | | | | | | | |--- no_of_weekend_nights > 1.50 | | | | | | | | | | | |--- weights: [0.00, 6.00] class: 1 | | | | | | |--- arrival_month > 8.50 | | | | | | | |--- arrival_date <= 14.00 | | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | | |--- arrival_year <= 2017.50 | | | | | | | | | | |--- no_of_week_nights <= 2.00 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | | |--- no_of_week_nights > 2.00 | | | | | | | | | | | |--- weights: [5.00, 0.00] class: 0 | | | | | | | | | |--- arrival_year > 2017.50 | | | | | | | | | | |--- arrival_date <= 2.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- arrival_date > 2.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | |--- arrival_month > 11.50 | | | | | | | | | |--- weights: [7.00, 0.00] class: 0 | | | | | | | |--- arrival_date > 14.00 | | | | | | | | |--- avg_price_per_room <= 161.17 | | | | | | | | | |--- lead_time <= 12.50 | | | | | | | | | | |--- weights: [30.00, 0.00] class: 0 | | | | | | | | | |--- lead_time > 12.50 | | | | | | | | | | |--- no_of_weekend_nights <= 1.00 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | | |--- no_of_weekend_nights > 1.00 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | |--- avg_price_per_room > 161.17 | | | | | | | | | |--- avg_price_per_room <= 166.50 | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | | | |--- avg_price_per_room > 166.50 | | | | | | | | | | |--- no_of_children <= 1.00 | | | | | | | | | | | |--- weights: [6.00, 0.00] class: 0 | | | | | | | | | | |--- no_of_children > 1.00 | | | | | | | | | | | |--- truncated branch of depth 2 | | | |--- lead_time > 13.50 | | | | |--- avg_price_per_room <= 105.27 | | | | | |--- avg_price_per_room <= 60.07 | | | | | | |--- lead_time <= 84.50 | | | | | | | |--- lead_time <= 51.50 | | | | | | | | |--- lead_time <= 50.50 | | | | | | | | | |--- avg_price_per_room <= 29.04 | | | | | | | | | | |--- weights: [19.00, 0.00] class: 0 | | | | | | | | | |--- avg_price_per_room > 29.04 | | | | | | | | | | |--- avg_price_per_room <= 49.84 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- avg_price_per_room > 49.84 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | |--- lead_time > 50.50 | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | |--- lead_time > 51.50 | | | | | | | | |--- weights: [32.00, 0.00] class: 0 | | | | | | |--- lead_time > 84.50 | | | | | | | |--- arrival_year <= 2017.50 | | | | | | | | |--- arrival_date <= 19.00 | | | | | | | | | |--- lead_time <= 139.00 | | | | | | | | | | |--- weights: [0.00, 8.00] class: 1 | | | | | | | | | |--- lead_time > 139.00 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | |--- arrival_date > 19.00 | | | | | | | | | |--- lead_time <= 87.50 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | |--- lead_time > 87.50 | | | | | | | | | | |--- no_of_weekend_nights <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- no_of_weekend_nights > 0.50 | | | | | | | | | | | |--- weights: [6.00, 0.00] class: 0 | | | | | | | |--- arrival_year > 2017.50 | | | | | | | | |--- avg_price_per_room <= 59.43 | | | | | | | | | |--- weights: [14.00, 0.00] class: 0 | | | | | | | | |--- avg_price_per_room > 59.43 | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | |--- avg_price_per_room > 60.07 | | | | | | |--- lead_time <= 25.50 | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | |--- arrival_month <= 1.50 | | | | | | | | | |--- weights: [29.00, 0.00] class: 0 | | | | | | | | |--- arrival_month > 1.50 | | | | | | | | | |--- arrival_year <= 2017.50 | | | | | | | | | | |--- lead_time <= 14.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | | |--- lead_time > 14.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | |--- arrival_year > 2017.50 | | | | | | | | | | |--- no_of_week_nights <= 3.50 | | | | | | | | | | | |--- truncated branch of depth 14 | | | | | | | | | | |--- no_of_week_nights > 3.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | |--- arrival_month > 11.50 | | | | | | | | |--- weights: [54.00, 0.00] class: 0 | | | | | | |--- lead_time > 25.50 | | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | | |--- type_of_meal_plan_Meal Plan 2 <= 0.50 | | | | | | | | | |--- arrival_year <= 2017.50 | | | | | | | | | | |--- lead_time <= 60.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | | |--- lead_time > 60.50 | | | | | | | | | | | |--- truncated branch of depth 10 | | | | | | | | | |--- arrival_year > 2017.50 | | | | | | | | | | |--- required_car_parking_space <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 28 | | | | | | | | | | |--- required_car_parking_space > 0.50 | | | | | | | | | | | |--- weights: [12.00, 0.00] class: 0 | | | | | | | | |--- type_of_meal_plan_Meal Plan 2 > 0.50 | | | | | | | | | |--- arrival_month <= 5.00 | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | | | |--- arrival_month > 5.00 | | | | | | | | | | |--- no_of_week_nights <= 3.00 | | | | | | | | | | | |--- weights: [0.00, 35.00] class: 1 | | | | | | | | | | |--- no_of_week_nights > 3.00 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | | |--- required_car_parking_space <= 0.50 | | | | | | | | | |--- arrival_year <= 2017.50 | | | | | | | | | | |--- lead_time <= 57.00 | | | | | | | | | | | |--- weights: [6.00, 0.00] class: 0 | | | | | | | | | | |--- lead_time > 57.00 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | |--- arrival_year > 2017.50 | | | | | | | | | | |--- no_of_adults <= 1.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- no_of_adults > 1.50 | | | | | | | | | | | |--- truncated branch of depth 15 | | | | | | | | |--- required_car_parking_space > 0.50 | | | | | | | | | |--- weights: [6.00, 0.00] class: 0 | | | | |--- avg_price_per_room > 105.27 | | | | | |--- required_car_parking_space <= 0.50 | | | | | | |--- arrival_month <= 10.50 | | | | | | | |--- arrival_year <= 2017.50 | | | | | | | | |--- type_of_meal_plan_Meal Plan 2 <= 0.50 | | | | | | | | | |--- arrival_month <= 9.50 | | | | | | | | | | |--- no_of_week_nights <= 3.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- no_of_week_nights > 3.50 | | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | | | |--- arrival_month > 9.50 | | | | | | | | | | |--- weights: [12.00, 0.00] class: 0 | | | | | | | | |--- type_of_meal_plan_Meal Plan 2 > 0.50 | | | | | | | | | |--- weights: [0.00, 13.00] class: 1 | | | | | | | |--- arrival_year > 2017.50 | | | | | | | | |--- room_type_reserved_Room_Type 5 <= 0.50 | | | | | | | | | |--- avg_price_per_room <= 171.22 | | | | | | | | | | |--- arrival_date <= 24.50 | | | | | | | | | | | |--- truncated branch of depth 20 | | | | | | | | | | |--- arrival_date > 24.50 | | | | | | | | | | | |--- truncated branch of depth 16 | | | | | | | | | |--- avg_price_per_room > 171.22 | | | | | | | | | | |--- avg_price_per_room <= 181.24 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | | |--- avg_price_per_room > 181.24 | | | | | | | | | | | |--- truncated branch of depth 9 | | | | | | | | |--- room_type_reserved_Room_Type 5 > 0.50 | | | | | | | | | |--- arrival_date <= 26.50 | | | | | | | | | | |--- avg_price_per_room <= 175.71 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- avg_price_per_room > 175.71 | | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | | | |--- arrival_date > 26.50 | | | | | | | | | | |--- weights: [0.00, 5.00] class: 1 | | | | | | |--- arrival_month > 10.50 | | | | | | | |--- lead_time <= 22.50 | | | | | | | | |--- no_of_adults <= 1.50 | | | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | | | |--- weights: [0.00, 4.00] class: 1 | | | | | | | | | |--- arrival_month > 11.50 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | |--- no_of_adults > 1.50 | | | | | | | | | |--- weights: [22.00, 0.00] class: 0 | | | | | | | |--- lead_time > 22.50 | | | | | | | | |--- avg_price_per_room <= 168.06 | | | | | | | | | |--- avg_price_per_room <= 147.75 | | | | | | | | | | |--- no_of_week_nights <= 3.50 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | | | | |--- no_of_week_nights > 3.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | |--- avg_price_per_room > 147.75 | | | | | | | | | | |--- weights: [0.00, 15.00] class: 1 | | | | | | | | |--- avg_price_per_room > 168.06 | | | | | | | | | |--- arrival_date <= 2.50 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | |--- arrival_date > 2.50 | | | | | | | | | | |--- lead_time <= 80.00 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- lead_time > 80.00 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | |--- required_car_parking_space > 0.50 | | | | | | |--- no_of_week_nights <= 8.00 | | | | | | | |--- weights: [39.00, 0.00] class: 0 | | | | | | |--- no_of_week_nights > 8.00 | | | | | | | |--- weights: [0.00, 1.00] class: 1 | |--- no_of_special_requests > 0.50 | | |--- no_of_special_requests <= 1.50 | | | |--- market_segment_type_Online <= 0.50 | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | |--- lead_time <= 102.50 | | | | | | |--- no_of_week_nights <= 11.00 | | | | | | | |--- room_type_reserved_Room_Type 5 <= 0.50 | | | | | | | | |--- lead_time <= 91.50 | | | | | | | | | |--- avg_price_per_room <= 129.50 | | | | | | | | | | |--- weights: [848.00, 0.00] class: 0 | | | | | | | | | |--- avg_price_per_room > 129.50 | | | | | | | | | | |--- avg_price_per_room <= 131.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- avg_price_per_room > 131.50 | | | | | | | | | | | |--- weights: [27.00, 0.00] class: 0 | | | | | | | | |--- lead_time > 91.50 | | | | | | | | | |--- no_of_children <= 0.50 | | | | | | | | | | |--- weights: [43.00, 0.00] class: 0 | | | | | | | | | |--- no_of_children > 0.50 | | | | | | | | | | |--- lead_time <= 95.50 | | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | | | | |--- lead_time > 95.50 | | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | |--- room_type_reserved_Room_Type 5 > 0.50 | | | | | | | | |--- avg_price_per_room <= 164.79 | | | | | | | | | |--- no_of_weekend_nights <= 1.50 | | | | | | | | | | |--- weights: [11.00, 0.00] class: 0 | | | | | | | | | |--- no_of_weekend_nights > 1.50 | | | | | | | | | | |--- market_segment_type_Corporate <= 0.50 | | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | | | | |--- market_segment_type_Corporate > 0.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | |--- avg_price_per_room > 164.79 | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | |--- no_of_week_nights > 11.00 | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | |--- lead_time > 102.50 | | | | | | |--- lead_time <= 104.50 | | | | | | | |--- no_of_week_nights <= 2.50 | | | | | | | | |--- avg_price_per_room <= 67.65 | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | |--- avg_price_per_room > 67.65 | | | | | | | | | |--- weights: [0.00, 4.00] class: 1 | | | | | | | |--- no_of_week_nights > 2.50 | | | | | | | | |--- weights: [4.00, 0.00] class: 0 | | | | | | |--- lead_time > 104.50 | | | | | | | |--- avg_price_per_room <= 141.75 | | | | | | | | |--- no_of_week_nights <= 2.50 | | | | | | | | | |--- avg_price_per_room <= 83.39 | | | | | | | | | | |--- arrival_month <= 3.50 | | | | | | | | | | | |--- weights: [6.00, 0.00] class: 0 | | | | | | | | | | |--- arrival_month > 3.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | |--- avg_price_per_room > 83.39 | | | | | | | | | | |--- lead_time <= 143.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- lead_time > 143.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | |--- no_of_week_nights > 2.50 | | | | | | | | | |--- avg_price_per_room <= 122.00 | | | | | | | | | | |--- weights: [54.00, 0.00] class: 0 | | | | | | | | | |--- avg_price_per_room > 122.00 | | | | | | | | | | |--- room_type_reserved_Room_Type 7 <= 0.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | | |--- room_type_reserved_Room_Type 7 > 0.50 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | |--- avg_price_per_room > 141.75 | | | | | | | | |--- no_of_adults <= 2.50 | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | | |--- no_of_adults > 2.50 | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | |--- lead_time <= 63.00 | | | | | | |--- market_segment_type_Corporate <= 0.50 | | | | | | | |--- weights: [18.00, 0.00] class: 0 | | | | | | |--- market_segment_type_Corporate > 0.50 | | | | | | | |--- arrival_date <= 14.50 | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | |--- arrival_date > 14.50 | | | | | | | | |--- weights: [2.00, 1.00] class: 0 | | | | | |--- lead_time > 63.00 | | | | | | |--- weights: [0.00, 6.00] class: 1 | | | |--- market_segment_type_Online > 0.50 | | | | |--- lead_time <= 8.50 | | | | | |--- lead_time <= 4.50 | | | | | | |--- no_of_week_nights <= 10.00 | | | | | | | |--- avg_price_per_room <= 157.64 | | | | | | | | |--- room_type_reserved_Room_Type 2 <= 0.50 | | | | | | | | | |--- arrival_date <= 4.50 | | | | | | | | | | |--- weights: [81.00, 0.00] class: 0 | | | | | | | | | |--- arrival_date > 4.50 | | | | | | | | | | |--- arrival_date <= 27.50 | | | | | | | | | | | |--- truncated branch of depth 15 | | | | | | | | | | |--- arrival_date > 27.50 | | | | | | | | | | | |--- weights: [69.00, 0.00] class: 0 | | | | | | | | |--- room_type_reserved_Room_Type 2 > 0.50 | | | | | | | | | |--- no_of_week_nights <= 2.50 | | | | | | | | | | |--- weights: [5.00, 0.00] class: 0 | | | | | | | | | |--- no_of_week_nights > 2.50 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | |--- avg_price_per_room > 157.64 | | | | | | | | |--- avg_price_per_room <= 158.50 | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | |--- avg_price_per_room > 158.50 | | | | | | | | | |--- room_type_reserved_Room_Type 4 <= 0.50 | | | | | | | | | | |--- lead_time <= 3.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- lead_time > 3.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | |--- room_type_reserved_Room_Type 4 > 0.50 | | | | | | | | | | |--- arrival_date <= 17.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- arrival_date > 17.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | |--- no_of_week_nights > 10.00 | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | |--- lead_time > 4.50 | | | | | | |--- room_type_reserved_Room_Type 2 <= 0.50 | | | | | | | |--- avg_price_per_room <= 123.60 | | | | | | | | |--- arrival_month <= 8.50 | | | | | | | | | |--- arrival_date <= 13.50 | | | | | | | | | | |--- arrival_month <= 4.50 | | | | | | | | | | | |--- truncated branch of depth 9 | | | | | | | | | | |--- arrival_month > 4.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | |--- arrival_date > 13.50 | | | | | | | | | | |--- no_of_weekend_nights <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | | |--- no_of_weekend_nights > 0.50 | | | | | | | | | | | |--- weights: [37.00, 0.00] class: 0 | | | | | | | | |--- arrival_month > 8.50 | | | | | | | | | |--- weights: [95.00, 0.00] class: 0 | | | | | | | |--- avg_price_per_room > 123.60 | | | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | | | |--- arrival_date <= 15.50 | | | | | | | | | | |--- avg_price_per_room <= 128.91 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- avg_price_per_room > 128.91 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | | | |--- arrival_date > 15.50 | | | | | | | | | | |--- lead_time <= 6.50 | | | | | | | | | | | |--- weights: [42.00, 0.00] class: 0 | | | | | | | | | | |--- lead_time > 6.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | | | |--- arrival_month <= 9.50 | | | | | | | | | | |--- no_of_week_nights <= 1.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- no_of_week_nights > 1.50 | | | | | | | | | | | |--- weights: [11.00, 0.00] class: 0 | | | | | | | | | |--- arrival_month > 9.50 | | | | | | | | | | |--- lead_time <= 6.50 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | | |--- lead_time > 6.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | |--- room_type_reserved_Room_Type 2 > 0.50 | | | | | | | |--- avg_price_per_room <= 94.48 | | | | | | | | |--- weights: [0.00, 3.00] class: 1 | | | | | | | |--- avg_price_per_room > 94.48 | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | |--- lead_time > 8.50 | | | | | |--- required_car_parking_space <= 0.50 | | | | | | |--- avg_price_per_room <= 127.62 | | | | | | | |--- no_of_weekend_nights <= 2.50 | | | | | | | | |--- lead_time <= 43.50 | | | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | | | |--- arrival_month <= 1.50 | | | | | | | | | | | |--- weights: [87.00, 0.00] class: 0 | | | | | | | | | | |--- arrival_month > 1.50 | | | | | | | | | | | |--- truncated branch of depth 23 | | | | | | | | | |--- arrival_month > 11.50 | | | | | | | | | | |--- weights: [128.00, 0.00] class: 0 | | | | | | | | |--- lead_time > 43.50 | | | | | | | | | |--- arrival_year <= 2017.50 | | | | | | | | | | |--- arrival_month <= 7.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- arrival_month > 7.50 | | | | | | | | | | | |--- truncated branch of depth 10 | | | | | | | | | |--- arrival_year > 2017.50 | | | | | | | | | | |--- arrival_month <= 8.50 | | | | | | | | | | | |--- truncated branch of depth 20 | | | | | | | | | | |--- arrival_month > 8.50 | | | | | | | | | | | |--- truncated branch of depth 21 | | | | | | | |--- no_of_weekend_nights > 2.50 | | | | | | | | |--- arrival_month <= 1.50 | | | | | | | | | |--- weights: [3.00, 0.00] class: 0 | | | | | | | | |--- arrival_month > 1.50 | | | | | | | | | |--- avg_price_per_room <= 119.12 | | | | | | | | | | |--- no_of_week_nights <= 8.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- no_of_week_nights > 8.50 | | | | | | | | | | | |--- weights: [0.00, 12.00] class: 1 | | | | | | | | | |--- avg_price_per_room > 119.12 | | | | | | | | | | |--- weights: [3.00, 0.00] class: 0 | | | | | | |--- avg_price_per_room > 127.62 | | | | | | | |--- lead_time <= 142.50 | | | | | | | | |--- arrival_month <= 8.50 | | | | | | | | | |--- arrival_date <= 19.50 | | | | | | | | | | |--- avg_price_per_room <= 177.15 | | | | | | | | | | | |--- truncated branch of depth 15 | | | | | | | | | | |--- avg_price_per_room > 177.15 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | |--- arrival_date > 19.50 | | | | | | | | | | |--- arrival_date <= 27.50 | | | | | | | | | | | |--- truncated branch of depth 11 | | | | | | | | | | |--- arrival_date > 27.50 | | | | | | | | | | | |--- truncated branch of depth 10 | | | | | | | | |--- arrival_month > 8.50 | | | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | | | |--- arrival_year <= 2017.50 | | | | | | | | | | | |--- truncated branch of depth 7 | | | | | | | | | | |--- arrival_year > 2017.50 | | | | | | | | | | | |--- truncated branch of depth 18 | | | | | | | | | |--- arrival_month > 11.50 | | | | | | | | | | |--- lead_time <= 100.50 | | | | | | | | | | | |--- weights: [49.00, 0.00] class: 0 | | | | | | | | | | |--- lead_time > 100.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | |--- lead_time > 142.50 | | | | | | | | |--- avg_price_per_room <= 142.65 | | | | | | | | | |--- no_of_week_nights <= 2.50 | | | | | | | | | | |--- weights: [5.00, 0.00] class: 0 | | | | | | | | | |--- no_of_week_nights > 2.50 | | | | | | | | | | |--- no_of_weekend_nights <= 0.50 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | | |--- no_of_weekend_nights > 0.50 | | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | | |--- avg_price_per_room > 142.65 | | | | | | | | | |--- avg_price_per_room <= 182.93 | | | | | | | | | | |--- weights: [0.00, 12.00] class: 1 | | | | | | | | | |--- avg_price_per_room > 182.93 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | |--- required_car_parking_space > 0.50 | | | | | | |--- no_of_week_nights <= 7.50 | | | | | | | |--- weights: [180.00, 0.00] class: 0 | | | | | | |--- no_of_week_nights > 7.50 | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | |--- no_of_special_requests > 1.50 | | | |--- lead_time <= 90.50 | | | | |--- no_of_week_nights <= 3.50 | | | | | |--- weights: [2126.00, 0.00] class: 0 | | | | |--- no_of_week_nights > 3.50 | | | | | |--- no_of_week_nights <= 9.50 | | | | | | |--- no_of_special_requests <= 2.50 | | | | | | | |--- lead_time <= 6.50 | | | | | | | | |--- weights: [43.00, 0.00] class: 0 | | | | | | | |--- lead_time > 6.50 | | | | | | | | |--- room_type_reserved_Room_Type 4 <= 0.50 | | | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | | | |--- arrival_date <= 3.00 | | | | | | | | | | | |--- weights: [15.00, 0.00] class: 0 | | | | | | | | | | |--- arrival_date > 3.00 | | | | | | | | | | | |--- truncated branch of depth 10 | | | | | | | | | |--- arrival_month > 11.50 | | | | | | | | | | |--- weights: [17.00, 0.00] class: 0 | | | | | | | | |--- room_type_reserved_Room_Type 4 > 0.50 | | | | | | | | | |--- no_of_weekend_nights <= 1.50 | | | | | | | | | | |--- weights: [34.00, 0.00] class: 0 | | | | | | | | | |--- no_of_weekend_nights > 1.50 | | | | | | | | | | |--- lead_time <= 80.00 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- lead_time > 80.00 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | |--- no_of_special_requests > 2.50 | | | | | | | |--- weights: [70.00, 0.00] class: 0 | | | | | |--- no_of_week_nights > 9.50 | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | |--- lead_time > 90.50 | | | | |--- no_of_special_requests <= 2.50 | | | | | |--- arrival_month <= 8.50 | | | | | | |--- lead_time <= 150.50 | | | | | | | |--- arrival_year <= 2017.50 | | | | | | | | |--- arrival_month <= 7.50 | | | | | | | | | |--- arrival_date <= 4.50 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | |--- arrival_date > 4.50 | | | | | | | | | | |--- arrival_date <= 26.00 | | | | | | | | | | | |--- weights: [0.00, 5.00] class: 1 | | | | | | | | | | |--- arrival_date > 26.00 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | |--- arrival_month > 7.50 | | | | | | | | | |--- arrival_date <= 14.00 | | | | | | | | | | |--- weights: [8.00, 0.00] class: 0 | | | | | | | | | |--- arrival_date > 14.00 | | | | | | | | | | |--- avg_price_per_room <= 84.14 | | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | | | | |--- avg_price_per_room > 84.14 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | |--- arrival_year > 2017.50 | | | | | | | | |--- no_of_children <= 0.50 | | | | | | | | | |--- avg_price_per_room <= 157.50 | | | | | | | | | | |--- arrival_date <= 29.50 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | | | | |--- arrival_date > 29.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | |--- avg_price_per_room > 157.50 | | | | | | | | | | |--- arrival_date <= 12.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- arrival_date > 12.50 | | | | | | | | | | | |--- weights: [7.00, 0.00] class: 0 | | | | | | | | |--- no_of_children > 0.50 | | | | | | | | | |--- no_of_week_nights <= 4.50 | | | | | | | | | | |--- no_of_children <= 2.50 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | | | | |--- no_of_children > 2.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | |--- no_of_week_nights > 4.50 | | | | | | | | | | |--- arrival_date <= 16.50 | | | | | | | | | | | |--- weights: [0.00, 3.00] class: 1 | | | | | | | | | | |--- arrival_date > 16.50 | | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | |--- lead_time > 150.50 | | | | | | | |--- avg_price_per_room <= 103.50 | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | |--- avg_price_per_room > 103.50 | | | | | | | | |--- weights: [0.00, 5.00] class: 1 | | | | | |--- arrival_month > 8.50 | | | | | | |--- avg_price_per_room <= 90.42 | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | |--- arrival_date <= 21.50 | | | | | | | | | |--- avg_price_per_room <= 70.52 | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | | | |--- avg_price_per_room > 70.52 | | | | | | | | | | |--- lead_time <= 123.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- lead_time > 123.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | |--- arrival_date > 21.50 | | | | | | | | | |--- arrival_date <= 30.50 | | | | | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | | | | | |--- weights: [8.00, 0.00] class: 0 | | | | | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- arrival_date > 30.50 | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | |--- arrival_month > 11.50 | | | | | | | | |--- lead_time <= 101.00 | | | | | | | | | |--- weights: [11.00, 0.00] class: 0 | | | | | | | | |--- lead_time > 101.00 | | | | | | | | | |--- arrival_date <= 7.50 | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | | | |--- arrival_date > 7.50 | | | | | | | | | | |--- arrival_date <= 21.50 | | | | | | | | | | | |--- weights: [7.00, 0.00] class: 0 | | | | | | | | | | |--- arrival_date > 21.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | |--- avg_price_per_room > 90.42 | | | | | | | |--- no_of_adults <= 1.50 | | | | | | | | |--- weights: [11.00, 0.00] class: 0 | | | | | | | |--- no_of_adults > 1.50 | | | | | | | | |--- avg_price_per_room <= 153.15 | | | | | | | | | |--- no_of_weekend_nights <= 1.50 | | | | | | | | | | |--- no_of_week_nights <= 2.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- no_of_week_nights > 2.50 | | | | | | | | | | | |--- truncated branch of depth 7 | | | | | | | | | |--- no_of_weekend_nights > 1.50 | | | | | | | | | | |--- lead_time <= 148.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- lead_time > 148.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | |--- avg_price_per_room > 153.15 | | | | | | | | | |--- lead_time <= 100.00 | | | | | | | | | | |--- lead_time <= 96.50 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | | |--- lead_time > 96.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | |--- lead_time > 100.00 | | | | | | | | | | |--- lead_time <= 148.50 | | | | | | | | | | | |--- weights: [14.00, 0.00] class: 0 | | | | | | | | | | |--- lead_time > 148.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | |--- no_of_special_requests > 2.50 | | | | | |--- weights: [90.00, 0.00] class: 0 |--- lead_time > 151.50 | |--- avg_price_per_room <= 100.04 | | |--- no_of_special_requests <= 0.50 | | | |--- market_segment_type_Online <= 0.50 | | | | |--- no_of_adults <= 1.50 | | | | | |--- lead_time <= 163.50 | | | | | | |--- arrival_date <= 7.00 | | | | | | | |--- weights: [0.00, 15.00] class: 1 | | | | | | |--- arrival_date > 7.00 | | | | | | | |--- avg_price_per_room <= 62.50 | | | | | | | | |--- weights: [1.00, 1.00] class: 0 | | | | | | | |--- avg_price_per_room > 62.50 | | | | | | | | |--- weights: [4.00, 0.00] class: 0 | | | | | |--- lead_time > 163.50 | | | | | | |--- lead_time <= 341.00 | | | | | | | |--- lead_time <= 173.00 | | | | | | | | |--- arrival_date <= 3.50 | | | | | | | | | |--- avg_price_per_room <= 88.25 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | |--- avg_price_per_room > 88.25 | | | | | | | | | | |--- no_of_week_nights <= 1.50 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | | |--- no_of_week_nights > 1.50 | | | | | | | | | | | |--- weights: [61.00, 6.00] class: 0 | | | | | | | | |--- arrival_date > 3.50 | | | | | | | | | |--- arrival_month <= 5.00 | | | | | | | | | | |--- weights: [3.00, 0.00] class: 0 | | | | | | | | | |--- arrival_month > 5.00 | | | | | | | | | | |--- weights: [0.00, 9.00] class: 1 | | | | | | | |--- lead_time > 173.00 | | | | | | | | |--- arrival_month <= 5.50 | | | | | | | | | |--- avg_price_per_room <= 88.00 | | | | | | | | | | |--- weights: [9.00, 0.00] class: 0 | | | | | | | | | |--- avg_price_per_room > 88.00 | | | | | | | | | | |--- weights: [0.00, 3.00] class: 1 | | | | | | | | |--- arrival_month > 5.50 | | | | | | | | | |--- avg_price_per_room <= 98.00 | | | | | | | | | | |--- avg_price_per_room <= 55.21 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- avg_price_per_room > 55.21 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | |--- avg_price_per_room > 98.00 | | | | | | | | | | |--- no_of_week_nights <= 2.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | | |--- no_of_week_nights > 2.50 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | |--- lead_time > 341.00 | | | | | | | |--- no_of_week_nights <= 4.00 | | | | | | | | |--- lead_time <= 402.00 | | | | | | | | | |--- avg_price_per_room <= 80.00 | | | | | | | | | | |--- weights: [5.00, 0.00] class: 0 | | | | | | | | | |--- avg_price_per_room > 80.00 | | | | | | | | | | |--- no_of_weekend_nights <= 1.00 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- no_of_weekend_nights > 1.00 | | | | | | | | | | | |--- weights: [3.00, 2.00] class: 0 | | | | | | | | |--- lead_time > 402.00 | | | | | | | | | |--- weights: [0.00, 3.00] class: 1 | | | | | | | |--- no_of_week_nights > 4.00 | | | | | | | | |--- avg_price_per_room <= 88.33 | | | | | | | | | |--- weights: [0.00, 7.00] class: 1 | | | | | | | | |--- avg_price_per_room > 88.33 | | | | | | | | | |--- weights: [1.00, 1.00] class: 0 | | | | |--- no_of_adults > 1.50 | | | | | |--- avg_price_per_room <= 84.58 | | | | | | |--- lead_time <= 244.00 | | | | | | | |--- no_of_week_nights <= 1.50 | | | | | | | | |--- no_of_weekend_nights <= 1.50 | | | | | | | | | |--- arrival_date <= 19.00 | | | | | | | | | | |--- lead_time <= 166.50 | | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | | | | |--- lead_time > 166.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- arrival_date > 19.00 | | | | | | | | | | |--- weights: [3.00, 0.00] class: 0 | | | | | | | | |--- no_of_weekend_nights > 1.50 | | | | | | | | | |--- weights: [24.00, 0.00] class: 0 | | | | | | | |--- no_of_week_nights > 1.50 | | | | | | | | |--- avg_price_per_room <= 66.50 | | | | | | | | | |--- no_of_weekend_nights <= 0.50 | | | | | | | | | | |--- arrival_date <= 16.00 | | | | | | | | | | | |--- weights: [0.00, 7.00] class: 1 | | | | | | | | | | |--- arrival_date > 16.00 | | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | | | |--- no_of_weekend_nights > 0.50 | | | | | | | | | | |--- no_of_week_nights <= 5.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- no_of_week_nights > 5.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | |--- avg_price_per_room > 66.50 | | | | | | | | | |--- type_of_meal_plan_Meal Plan 2 <= 0.50 | | | | | | | | | | |--- avg_price_per_room <= 75.75 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- avg_price_per_room > 75.75 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | |--- type_of_meal_plan_Meal Plan 2 > 0.50 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | |--- lead_time > 244.00 | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | |--- arrival_year <= 2017.50 | | | | | | | | | |--- weights: [34.00, 0.00] class: 0 | | | | | | | | |--- arrival_year > 2017.50 | | | | | | | | | |--- avg_price_per_room <= 80.38 | | | | | | | | | | |--- no_of_week_nights <= 3.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | | |--- no_of_week_nights > 3.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | |--- avg_price_per_room > 80.38 | | | | | | | | | | |--- weights: [11.00, 0.00] class: 0 | | | | | | | |--- arrival_month > 11.50 | | | | | | | | |--- weights: [37.00, 0.00] class: 0 | | | | | |--- avg_price_per_room > 84.58 | | | | | | |--- arrival_month <= 11.50 | | | | | | | |--- no_of_weekend_nights <= 1.50 | | | | | | | | |--- room_type_reserved_Room_Type 4 <= 0.50 | | | | | | | | | |--- no_of_adults <= 2.50 | | | | | | | | | | |--- arrival_date <= 2.50 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | | |--- arrival_date > 2.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | |--- no_of_adults > 2.50 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | |--- room_type_reserved_Room_Type 4 > 0.50 | | | | | | | | | |--- weights: [4.00, 0.00] class: 0 | | | | | | | |--- no_of_weekend_nights > 1.50 | | | | | | | | |--- arrival_month <= 6.50 | | | | | | | | | |--- weights: [0.00, 13.00] class: 1 | | | | | | | | |--- arrival_month > 6.50 | | | | | | | | | |--- weights: [14.00, 0.00] class: 0 | | | | | | |--- arrival_month > 11.50 | | | | | | | |--- weights: [9.00, 0.00] class: 0 | | | |--- market_segment_type_Online > 0.50 | | | | |--- avg_price_per_room <= 35.17 | | | | | |--- no_of_weekend_nights <= 0.50 | | | | | | |--- arrival_date <= 12.50 | | | | | | | |--- no_of_adults <= 1.50 | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | |--- no_of_adults > 1.50 | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | |--- arrival_date > 12.50 | | | | | | | |--- arrival_date <= 19.00 | | | | | | | | |--- weights: [0.00, 4.00] class: 1 | | | | | | | |--- arrival_date > 19.00 | | | | | | | | |--- no_of_adults <= 1.50 | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | |--- no_of_adults > 1.50 | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | |--- no_of_weekend_nights > 0.50 | | | | | | |--- weights: [9.00, 0.00] class: 0 | | | | |--- avg_price_per_room > 35.17 | | | | | |--- arrival_month <= 11.50 | | | | | | |--- weights: [0.00, 523.00] class: 1 | | | | | |--- arrival_month > 11.50 | | | | | | |--- no_of_weekend_nights <= 0.50 | | | | | | | |--- lead_time <= 263.50 | | | | | | | | |--- avg_price_per_room <= 76.87 | | | | | | | | | |--- no_of_week_nights <= 1.50 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | |--- no_of_week_nights > 1.50 | | | | | | | | | | |--- weights: [0.00, 5.00] class: 1 | | | | | | | | |--- avg_price_per_room > 76.87 | | | | | | | | | |--- weights: [3.00, 0.00] class: 0 | | | | | | | |--- lead_time > 263.50 | | | | | | | | |--- weights: [0.00, 7.00] class: 1 | | | | | | |--- no_of_weekend_nights > 0.50 | | | | | | | |--- no_of_week_nights <= 1.50 | | | | | | | | |--- arrival_date <= 3.50 | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | |--- arrival_date > 3.50 | | | | | | | | | |--- weights: [0.00, 6.00] class: 1 | | | | | | | |--- no_of_week_nights > 1.50 | | | | | | | | |--- weights: [0.00, 58.00] class: 1 | | |--- no_of_special_requests > 0.50 | | | |--- no_of_weekend_nights <= 0.50 | | | | |--- lead_time <= 180.50 | | | | | |--- lead_time <= 159.50 | | | | | | |--- arrival_month <= 8.50 | | | | | | | |--- weights: [8.00, 0.00] class: 0 | | | | | | |--- arrival_month > 8.50 | | | | | | | |--- arrival_date <= 23.50 | | | | | | | | |--- lead_time <= 156.50 | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | |--- lead_time > 156.50 | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | |--- arrival_date > 23.50 | | | | | | | | |--- weights: [0.00, 4.00] class: 1 | | | | | |--- lead_time > 159.50 | | | | | | |--- no_of_adults <= 0.50 | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | |--- no_of_adults > 0.50 | | | | | | | |--- arrival_date <= 1.50 | | | | | | | | |--- no_of_week_nights <= 1.50 | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | | |--- no_of_week_nights > 1.50 | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | |--- arrival_date > 1.50 | | | | | | | | |--- weights: [48.00, 0.00] class: 0 | | | | |--- lead_time > 180.50 | | | | | |--- market_segment_type_Online <= 0.50 | | | | | | |--- no_of_adults <= 2.50 | | | | | | | |--- lead_time <= 356.00 | | | | | | | | |--- lead_time <= 302.50 | | | | | | | | | |--- weights: [15.00, 0.00] class: 0 | | | | | | | | |--- lead_time > 302.50 | | | | | | | | | |--- weights: [2.00, 1.00] class: 0 | | | | | | | |--- lead_time > 356.00 | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | |--- no_of_adults > 2.50 | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | |--- market_segment_type_Online > 0.50 | | | | | | |--- no_of_special_requests <= 2.50 | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | |--- avg_price_per_room <= 44.12 | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | | |--- avg_price_per_room > 44.12 | | | | | | | | | |--- weights: [0.00, 125.00] class: 1 | | | | | | | |--- arrival_month > 11.50 | | | | | | | | |--- lead_time <= 300.50 | | | | | | | | | |--- lead_time <= 226.50 | | | | | | | | | | |--- weights: [0.00, 3.00] class: 1 | | | | | | | | | |--- lead_time > 226.50 | | | | | | | | | | |--- lead_time <= 272.00 | | | | | | | | | | | |--- weights: [6.00, 0.00] class: 0 | | | | | | | | | | |--- lead_time > 272.00 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | |--- lead_time > 300.50 | | | | | | | | | |--- weights: [0.00, 5.00] class: 1 | | | | | | |--- no_of_special_requests > 2.50 | | | | | | | |--- weights: [12.00, 0.00] class: 0 | | | |--- no_of_weekend_nights > 0.50 | | | | |--- market_segment_type_Online <= 0.50 | | | | | |--- lead_time <= 348.50 | | | | | | |--- no_of_week_nights <= 5.50 | | | | | | | |--- arrival_date <= 30.00 | | | | | | | | |--- weights: [137.00, 0.00] class: 0 | | | | | | | |--- arrival_date > 30.00 | | | | | | | | |--- lead_time <= 168.00 | | | | | | | | | |--- weights: [2.00, 1.00] class: 0 | | | | | | | | |--- lead_time > 168.00 | | | | | | | | | |--- weights: [3.00, 0.00] class: 0 | | | | | | |--- no_of_week_nights > 5.50 | | | | | | | |--- lead_time <= 167.00 | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | |--- lead_time > 167.00 | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | |--- lead_time > 348.50 | | | | | | |--- lead_time <= 372.50 | | | | | | | |--- avg_price_per_room <= 58.50 | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | |--- avg_price_per_room > 58.50 | | | | | | | | |--- weights: [6.00, 2.00] class: 0 | | | | | | |--- lead_time > 372.50 | | | | | | | |--- weights: [1.00, 1.00] class: 0 | | | | |--- market_segment_type_Online > 0.50 | | | | | |--- no_of_week_nights <= 9.50 | | | | | | |--- arrival_month <= 11.50 | | | | | | | |--- arrival_date <= 27.50 | | | | | | | | |--- avg_price_per_room <= 81.12 | | | | | | | | | |--- lead_time <= 153.50 | | | | | | | | | | |--- no_of_weekend_nights <= 1.50 | | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | | | | |--- no_of_weekend_nights > 1.50 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | |--- lead_time > 153.50 | | | | | | | | | | |--- lead_time <= 157.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- lead_time > 157.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | |--- avg_price_per_room > 81.12 | | | | | | | | | |--- no_of_week_nights <= 6.50 | | | | | | | | | | |--- lead_time <= 233.00 | | | | | | | | | | | |--- truncated branch of depth 13 | | | | | | | | | | |--- lead_time > 233.00 | | | | | | | | | | | |--- truncated branch of depth 9 | | | | | | | | | |--- no_of_week_nights > 6.50 | | | | | | | | | | |--- lead_time <= 204.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- lead_time > 204.50 | | | | | | | | | | | |--- weights: [0.00, 3.00] class: 1 | | | | | | | |--- arrival_date > 27.50 | | | | | | | | |--- no_of_week_nights <= 1.50 | | | | | | | | | |--- lead_time <= 224.50 | | | | | | | | | | |--- lead_time <= 175.50 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | | |--- lead_time > 175.50 | | | | | | | | | | | |--- weights: [0.00, 10.00] class: 1 | | | | | | | | | |--- lead_time > 224.50 | | | | | | | | | | |--- weights: [4.00, 0.00] class: 0 | | | | | | | | |--- no_of_week_nights > 1.50 | | | | | | | | | |--- lead_time <= 269.00 | | | | | | | | | | |--- lead_time <= 176.00 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- lead_time > 176.00 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | |--- lead_time > 269.00 | | | | | | | | | | |--- weights: [0.00, 3.00] class: 1 | | | | | | |--- arrival_month > 11.50 | | | | | | | |--- arrival_date <= 14.50 | | | | | | | | |--- arrival_date <= 3.00 | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | |--- arrival_date > 3.00 | | | | | | | | | |--- avg_price_per_room <= 64.43 | | | | | | | | | | |--- lead_time <= 217.50 | | | | | | | | | | | |--- weights: [3.00, 0.00] class: 0 | | | | | | | | | | |--- lead_time > 217.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | |--- avg_price_per_room > 64.43 | | | | | | | | | | |--- weights: [8.00, 0.00] class: 0 | | | | | | | |--- arrival_date > 14.50 | | | | | | | | |--- no_of_week_nights <= 2.50 | | | | | | | | | |--- no_of_special_requests <= 1.50 | | | | | | | | | | |--- weights: [0.00, 8.00] class: 1 | | | | | | | | | |--- no_of_special_requests > 1.50 | | | | | | | | | | |--- arrival_date <= 19.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | | |--- arrival_date > 19.50 | | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | | |--- no_of_week_nights > 2.50 | | | | | | | | | |--- no_of_week_nights <= 4.50 | | | | | | | | | | |--- lead_time <= 281.50 | | | | | | | | | | | |--- weights: [8.00, 0.00] class: 0 | | | | | | | | | | |--- lead_time > 281.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | |--- no_of_week_nights > 4.50 | | | | | | | | | | |--- avg_price_per_room <= 82.74 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- avg_price_per_room > 82.74 | | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | |--- no_of_week_nights > 9.50 | | | | | | |--- lead_time <= 198.00 | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | |--- lead_time > 198.00 | | | | | | | |--- weights: [0.00, 7.00] class: 1 | |--- avg_price_per_room > 100.04 | | |--- arrival_month <= 11.50 | | | |--- no_of_special_requests <= 2.50 | | | | |--- weights: [0.00, 2108.00] class: 1 | | | |--- no_of_special_requests > 2.50 | | | | |--- weights: [31.00, 0.00] class: 0 | | |--- arrival_month > 11.50 | | | |--- no_of_special_requests <= 0.50 | | | | |--- weights: [47.00, 0.00] class: 0 | | | |--- no_of_special_requests > 0.50 | | | | |--- arrival_date <= 24.50 | | | | | |--- weights: [5.00, 0.00] class: 0 | | | | |--- arrival_date > 24.50 | | | | | |--- lead_time <= 172.50 | | | | | | |--- avg_price_per_room <= 135.49 | | | | | | | |--- weights: [3.00, 0.00] class: 0 | | | | | | |--- avg_price_per_room > 135.49 | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | |--- lead_time > 172.50 | | | | | | |--- no_of_special_requests <= 1.50 | | | | | | | |--- weights: [0.00, 9.00] class: 1 | | | | | | |--- no_of_special_requests > 1.50 | | | | | | | |--- room_type_reserved_Room_Type 4 <= 0.50 | | | | | | | | |--- no_of_weekend_nights <= 2.00 | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | | |--- no_of_weekend_nights > 2.00 | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | |--- room_type_reserved_Room_Type 4 > 0.50 | | | | | | | | |--- weights: [0.00, 4.00] class: 1
importances = model.feature_importances_
indices = np.argsort(importances)
plt.figure(figsize=(12, 12))
plt.title("Feature Importances")
plt.barh(range(len(indices)), importances[indices], color="violet", align="center")
plt.yticks(range(len(indices)), [feature_names[i] for i in indices])
plt.xlabel("Relative Importance")
plt.show()
Pre-Pruning
# Choose the type of classifier.
estimator = DecisionTreeClassifier(random_state=1, class_weight="balanced")
# Grid of parameters to choose from
parameters = {
"max_depth": np.arange(2, 7, 2),
"max_leaf_nodes": [50, 75, 150, 250],
"min_samples_split": [10, 30, 50, 70],
}
# Type of scoring used to compare parameter combinations
acc_scorer = make_scorer(f1_score)
# Run the grid search
grid_obj = GridSearchCV(estimator, parameters, scoring=acc_scorer, cv=5)
grid_obj = grid_obj.fit(X_train, y_train)
# Set the clf to the best combination of parameters
estimator = grid_obj.best_estimator_
# Fit the best algorithm to the data.
estimator.fit(X_train, y_train)
DecisionTreeClassifier(class_weight='balanced', max_depth=6, max_leaf_nodes=50,
min_samples_split=10, random_state=1)
decision_tree_tune_perf_train = model_performance_classification_sklearn(
estimator, X_train, y_train
)
decision_tree_tune_perf_train
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.83085 | 0.78608 | 0.72401 | 0.75377 |
confusion_matrix_sklearn(estimator, X_train, y_train)
decision_tree_tune_perf_test = model_performance_classification_sklearn(
estimator, X_test, y_test
)
decision_tree_tune_perf_test
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.83497 | 0.78336 | 0.72758 | 0.75444 |
confusion_matrix_sklearn(estimator, X_test, y_test)
plt.figure(figsize=(15, 12))
tree.plot_tree(
estimator,
feature_names=feature_names,
filled=True,
fontsize=9,
node_ids=True,
class_names=True,
)
plt.show()
clf = DecisionTreeClassifier(random_state=1)
path = clf.cost_complexity_pruning_path(X_train, y_train)
ccp_alphas, impurities = path.ccp_alphas, path.impurities
pd.DataFrame(path)
| ccp_alphas | impurities | |
|---|---|---|
| 0 | 0.00000 | 0.00757 |
| 1 | 0.00000 | 0.00757 |
| 2 | 0.00000 | 0.00757 |
| 3 | 0.00000 | 0.00757 |
| 4 | 0.00000 | 0.00757 |
| ... | ... | ... |
| 1371 | 0.00667 | 0.28690 |
| 1372 | 0.01304 | 0.29994 |
| 1373 | 0.01726 | 0.31720 |
| 1374 | 0.02399 | 0.36518 |
| 1375 | 0.07658 | 0.44176 |
1376 rows × 2 columns
fig, ax = plt.subplots(figsize=(15, 5))
ax.plot(ccp_alphas[:-1], impurities[:-1], marker="o", drawstyle="steps-post")
ax.set_xlabel("effective alpha")
ax.set_ylabel("total impurity of leaves")
ax.set_title("Total Impurity vs effective alpha for training set")
plt.show()
Next, we train a decision tree using the effective alphas. The last value
in ccp_alphas is the alpha value that prunes the whole tree,
leaving the tree, clfs[-1], with one node.
clfs = []
for ccp_alpha in ccp_alphas:
clf = DecisionTreeClassifier(random_state=1, ccp_alpha=ccp_alpha)
clf.fit(X_train, y_train)
clfs.append(clf)
print(
"Number of nodes in the last tree is: {} with ccp_alpha: {}".format(
clfs[-1].tree_.node_count, ccp_alphas[-1]
)
)
Number of nodes in the last tree is: 1 with ccp_alpha: 0.07657789477371368
For the remainder, we remove the last element in
clfs and ccp_alphas, because it is the trivial tree with only one
node. Here we show that the number of nodes and tree depth decreases as alpha
increases.
clfs = clfs[:-1]
ccp_alphas = ccp_alphas[:-1]
node_counts = [clf.tree_.node_count for clf in clfs]
depth = [clf.tree_.max_depth for clf in clfs]
fig, ax = plt.subplots(2, 1, figsize=(10, 7))
ax[0].plot(ccp_alphas, node_counts, marker="o", drawstyle="steps-post")
ax[0].set_xlabel("alpha")
ax[0].set_ylabel("number of nodes")
ax[0].set_title("Number of nodes vs alpha")
ax[1].plot(ccp_alphas, depth, marker="o", drawstyle="steps-post")
ax[1].set_xlabel("alpha")
ax[1].set_ylabel("depth of tree")
ax[1].set_title("Depth vs alpha")
fig.tight_layout()
f1_train = []
for clf in clfs:
pred_train = clf.predict(X_train)
values_train = f1_score(y_train, pred_train)
f1_train.append(values_train)
f1_test = []
for clf in clfs:
pred_test = clf.predict(X_test)
values_test = f1_score(y_test, pred_test)
f1_test.append(values_test)
fig, ax = plt.subplots(figsize=(15, 5))
ax.set_xlabel("alpha")
ax.set_ylabel("F1_Score")
ax.set_title("F1_score vs alpha for training and testing sets")
ax.plot(ccp_alphas, f1_train, marker="o", label="train", drawstyle="steps-post")
ax.plot(ccp_alphas, f1_test, marker="o", label="test", drawstyle="steps-post")
ax.legend()
plt.show()
# creating the model where we get highest train and test f1
index_best_model = np.argmax(f1_test)
best_model = clfs[index_best_model]
print(best_model)
DecisionTreeClassifier(ccp_alpha=0.00011632014003514709, random_state=1)
decision_tree_postpruned_perf_train = model_performance_classification_sklearn(
best_model, X_train, y_train
)
decision_tree_postpruned_perf_train
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.90182 | 0.83044 | 0.86596 | 0.84783 |
confusion_matrix_sklearn(best_model, X_train, y_train)
decision_tree_postpruned_perf_test = model_performance_classification_sklearn(
best_model, X_test, y_test
)
decision_tree_postpruned_perf_test
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.88303 | 0.79841 | 0.83319 | 0.81543 |
confusion_matrix_sklearn(best_model, X_train, y_train)
plt.figure(figsize=(15, 12))
tree.plot_tree(
estimator,
feature_names=feature_names,
filled=True,
fontsize=9,
node_ids=True,
class_names=True,
)
plt.show()
# Text report showing the rules of a decision tree -
print(tree.export_text(best_model, feature_names=feature_names, show_weights=True))
|--- lead_time <= 151.50 | |--- no_of_special_requests <= 0.50 | | |--- market_segment_type_Online <= 0.50 | | | |--- lead_time <= 90.50 | | | | |--- no_of_weekend_nights <= 0.50 | | | | | |--- avg_price_per_room <= 179.47 | | | | | | |--- market_segment_type_Offline <= 0.50 | | | | | | | |--- lead_time <= 16.50 | | | | | | | | |--- room_type_reserved_Room_Type 4 <= 0.50 | | | | | | | | | |--- weights: [509.00, 30.00] class: 0 | | | | | | | | |--- room_type_reserved_Room_Type 4 > 0.50 | | | | | | | | | |--- arrival_date <= 29.50 | | | | | | | | | | |--- no_of_adults <= 1.50 | | | | | | | | | | | |--- weights: [42.00, 5.00] class: 0 | | | | | | | | | | |--- no_of_adults > 1.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- arrival_date > 29.50 | | | | | | | | | | |--- weights: [1.00, 3.00] class: 1 | | | | | | | |--- lead_time > 16.50 | | | | | | | | |--- avg_price_per_room <= 135.00 | | | | | | | | | |--- weights: [162.00, 36.00] class: 0 | | | | | | | | |--- avg_price_per_room > 135.00 | | | | | | | | | |--- weights: [0.00, 8.00] class: 1 | | | | | | |--- market_segment_type_Offline > 0.50 | | | | | | | |--- weights: [1606.00, 0.00] class: 0 | | | | | |--- avg_price_per_room > 179.47 | | | | | | |--- avg_price_per_room <= 179.78 | | | | | | | |--- weights: [1.00, 17.00] class: 1 | | | | | | |--- avg_price_per_room > 179.78 | | | | | | | |--- weights: [3.00, 0.00] class: 0 | | | | |--- no_of_weekend_nights > 0.50 | | | | | |--- lead_time <= 68.50 | | | | | | |--- no_of_weekend_nights <= 4.50 | | | | | | | |--- lead_time <= 1.50 | | | | | | | | |--- arrival_date <= 27.50 | | | | | | | | | |--- weights: [95.00, 5.00] class: 0 | | | | | | | | |--- arrival_date > 27.50 | | | | | | | | | |--- lead_time <= 0.50 | | | | | | | | | | |--- weights: [10.00, 2.00] class: 0 | | | | | | | | | |--- lead_time > 0.50 | | | | | | | | | | |--- arrival_date <= 28.50 | | | | | | | | | | | |--- weights: [0.00, 33.00] class: 1 | | | | | | | | | | |--- arrival_date > 28.50 | | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | |--- lead_time > 1.50 | | | | | | | | |--- arrival_month <= 9.50 | | | | | | | | | |--- lead_time <= 59.50 | | | | | | | | | | |--- weights: [616.00, 68.00] class: 0 | | | | | | | | | |--- lead_time > 59.50 | | | | | | | | | | |--- arrival_date <= 19.50 | | | | | | | | | | | |--- weights: [22.00, 0.00] class: 0 | | | | | | | | | | |--- arrival_date > 19.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | |--- arrival_month > 9.50 | | | | | | | | | |--- weights: [524.00, 16.00] class: 0 | | | | | | |--- no_of_weekend_nights > 4.50 | | | | | | | |--- weights: [0.00, 8.00] class: 1 | | | | | |--- lead_time > 68.50 | | | | | | |--- avg_price_per_room <= 99.98 | | | | | | | |--- arrival_month <= 3.50 | | | | | | | | |--- avg_price_per_room <= 62.50 | | | | | | | | | |--- weights: [21.00, 0.00] class: 0 | | | | | | | | |--- avg_price_per_room > 62.50 | | | | | | | | | |--- lead_time <= 77.00 | | | | | | | | | | |--- weights: [1.00, 9.00] class: 1 | | | | | | | | | |--- lead_time > 77.00 | | | | | | | | | | |--- no_of_weekend_nights <= 1.50 | | | | | | | | | | | |--- weights: [9.00, 0.00] class: 0 | | | | | | | | | | |--- no_of_weekend_nights > 1.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | |--- arrival_month > 3.50 | | | | | | | | |--- weights: [103.00, 8.00] class: 0 | | | | | | |--- avg_price_per_room > 99.98 | | | | | | | |--- no_of_adults <= 1.50 | | | | | | | | |--- weights: [1.00, 52.00] class: 1 | | | | | | | |--- no_of_adults > 1.50 | | | | | | | | |--- avg_price_per_room <= 105.20 | | | | | | | | | |--- arrival_date <= 22.00 | | | | | | | | | | |--- weights: [3.00, 1.00] class: 0 | | | | | | | | | |--- arrival_date > 22.00 | | | | | | | | | | |--- weights: [1.00, 22.00] class: 1 | | | | | | | | |--- avg_price_per_room > 105.20 | | | | | | | | | |--- lead_time <= 88.50 | | | | | | | | | | |--- arrival_date <= 3.00 | | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | | | | |--- arrival_date > 3.00 | | | | | | | | | | | |--- weights: [29.00, 2.00] class: 0 | | | | | | | | | |--- lead_time > 88.50 | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | |--- lead_time > 90.50 | | | | |--- lead_time <= 117.50 | | | | | |--- avg_price_per_room <= 93.58 | | | | | | |--- arrival_date <= 6.50 | | | | | | | |--- no_of_week_nights <= 2.50 | | | | | | | | |--- no_of_weekend_nights <= 1.50 | | | | | | | | | |--- avg_price_per_room <= 80.38 | | | | | | | | | | |--- weights: [3.00, 70.00] class: 1 | | | | | | | | | |--- avg_price_per_room > 80.38 | | | | | | | | | | |--- weights: [3.00, 1.00] class: 0 | | | | | | | | |--- no_of_weekend_nights > 1.50 | | | | | | | | | |--- weights: [5.00, 1.00] class: 0 | | | | | | | |--- no_of_week_nights > 2.50 | | | | | | | | |--- weights: [39.00, 2.00] class: 0 | | | | | | |--- arrival_date > 6.50 | | | | | | | |--- avg_price_per_room <= 66.50 | | | | | | | | |--- no_of_weekend_nights <= 1.50 | | | | | | | | | |--- weights: [26.00, 1.00] class: 0 | | | | | | | | |--- no_of_weekend_nights > 1.50 | | | | | | | | | |--- avg_price_per_room <= 58.75 | | | | | | | | | | |--- weights: [6.00, 0.00] class: 0 | | | | | | | | | |--- avg_price_per_room > 58.75 | | | | | | | | | | |--- lead_time <= 97.50 | | | | | | | | | | | |--- weights: [3.00, 1.00] class: 0 | | | | | | | | | | |--- lead_time > 97.50 | | | | | | | | | | | |--- weights: [0.00, 39.00] class: 1 | | | | | | | |--- avg_price_per_room > 66.50 | | | | | | | | |--- type_of_meal_plan_Meal Plan 2 <= 0.50 | | | | | | | | | |--- arrival_date <= 29.50 | | | | | | | | | | |--- weights: [180.00, 19.00] class: 0 | | | | | | | | | |--- arrival_date > 29.50 | | | | | | | | | | |--- lead_time <= 96.00 | | | | | | | | | | | |--- weights: [8.00, 0.00] class: 0 | | | | | | | | | | |--- lead_time > 96.00 | | | | | | | | | | | |--- weights: [2.00, 7.00] class: 1 | | | | | | | | |--- type_of_meal_plan_Meal Plan 2 > 0.50 | | | | | | | | | |--- avg_price_per_room <= 82.50 | | | | | | | | | | |--- weights: [1.00, 7.00] class: 1 | | | | | | | | | |--- avg_price_per_room > 82.50 | | | | | | | | | | |--- weights: [12.00, 2.00] class: 0 | | | | | |--- avg_price_per_room > 93.58 | | | | | | |--- arrival_date <= 16.50 | | | | | | | |--- arrival_month <= 7.50 | | | | | | | | |--- weights: [36.00, 11.00] class: 0 | | | | | | | |--- arrival_month > 7.50 | | | | | | | | |--- avg_price_per_room <= 108.50 | | | | | | | | | |--- arrival_date <= 14.50 | | | | | | | | | | |--- weights: [10.00, 9.00] class: 0 | | | | | | | | | |--- arrival_date > 14.50 | | | | | | | | | | |--- weights: [0.00, 47.00] class: 1 | | | | | | | | |--- avg_price_per_room > 108.50 | | | | | | | | | |--- no_of_weekend_nights <= 0.50 | | | | | | | | | | |--- weights: [42.00, 0.00] class: 0 | | | | | | | | | |--- no_of_weekend_nights > 0.50 | | | | | | | | | | |--- arrival_date <= 9.50 | | | | | | | | | | | |--- weights: [4.00, 3.00] class: 0 | | | | | | | | | | |--- arrival_date > 9.50 | | | | | | | | | | | |--- weights: [1.00, 28.00] class: 1 | | | | | | |--- arrival_date > 16.50 | | | | | | | |--- arrival_month <= 8.50 | | | | | | | | |--- avg_price_per_room <= 127.39 | | | | | | | | | |--- weights: [8.00, 83.00] class: 1 | | | | | | | | |--- avg_price_per_room > 127.39 | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | |--- arrival_month > 8.50 | | | | | | | | |--- weights: [8.00, 7.00] class: 0 | | | | |--- lead_time > 117.50 | | | | | |--- no_of_week_nights <= 1.50 | | | | | | |--- arrival_date <= 7.50 | | | | | | | |--- weights: [51.00, 0.00] class: 0 | | | | | | |--- arrival_date > 7.50 | | | | | | | |--- avg_price_per_room <= 93.58 | | | | | | | | |--- avg_price_per_room <= 65.38 | | | | | | | | | |--- weights: [0.00, 3.00] class: 1 | | | | | | | | |--- avg_price_per_room > 65.38 | | | | | | | | | |--- weights: [33.00, 2.00] class: 0 | | | | | | | |--- avg_price_per_room > 93.58 | | | | | | | | |--- arrival_date <= 28.00 | | | | | | | | | |--- type_of_meal_plan_Meal Plan 2 <= 0.50 | | | | | | | | | | |--- weights: [1.00, 18.00] class: 1 | | | | | | | | | |--- type_of_meal_plan_Meal Plan 2 > 0.50 | | | | | | | | | | |--- weights: [19.00, 30.00] class: 1 | | | | | | | | |--- arrival_date > 28.00 | | | | | | | | | |--- weights: [13.00, 1.00] class: 0 | | | | | |--- no_of_week_nights > 1.50 | | | | | | |--- no_of_adults <= 1.50 | | | | | | | |--- weights: [113.00, 0.00] class: 0 | | | | | | |--- no_of_adults > 1.50 | | | | | | | |--- lead_time <= 125.50 | | | | | | | | |--- avg_price_per_room <= 90.85 | | | | | | | | | |--- avg_price_per_room <= 87.50 | | | | | | | | | | |--- weights: [18.00, 9.00] class: 0 | | | | | | | | | |--- avg_price_per_room > 87.50 | | | | | | | | | | |--- weights: [0.00, 10.00] class: 1 | | | | | | | | |--- avg_price_per_room > 90.85 | | | | | | | | | |--- weights: [14.00, 0.00] class: 0 | | | | | | | |--- lead_time > 125.50 | | | | | | | | |--- weights: [161.00, 13.00] class: 0 | | |--- market_segment_type_Online > 0.50 | | | |--- lead_time <= 13.50 | | | | |--- avg_price_per_room <= 119.42 | | | | | |--- arrival_month <= 8.50 | | | | | | |--- arrival_month <= 1.50 | | | | | | | |--- weights: [128.00, 0.00] class: 0 | | | | | | |--- arrival_month > 1.50 | | | | | | | |--- lead_time <= 3.50 | | | | | | | | |--- no_of_weekend_nights <= 1.50 | | | | | | | | | |--- weights: [217.00, 30.00] class: 0 | | | | | | | | |--- no_of_weekend_nights > 1.50 | | | | | | | | | |--- avg_price_per_room <= 75.46 | | | | | | | | | | |--- weights: [1.00, 12.00] class: 1 | | | | | | | | | |--- avg_price_per_room > 75.46 | | | | | | | | | | |--- weights: [20.00, 5.00] class: 0 | | | | | | | |--- lead_time > 3.50 | | | | | | | | |--- avg_price_per_room <= 99.38 | | | | | | | | | |--- weights: [96.00, 39.00] class: 0 | | | | | | | | |--- avg_price_per_room > 99.38 | | | | | | | | | |--- avg_price_per_room <= 117.25 | | | | | | | | | | |--- weights: [22.00, 38.00] class: 1 | | | | | | | | | |--- avg_price_per_room > 117.25 | | | | | | | | | | |--- weights: [13.00, 2.00] class: 0 | | | | | |--- arrival_month > 8.50 | | | | | | |--- no_of_week_nights <= 4.50 | | | | | | | |--- weights: [285.00, 13.00] class: 0 | | | | | | |--- no_of_week_nights > 4.50 | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | |--- weights: [0.00, 5.00] class: 1 | | | | | | | |--- arrival_month > 11.50 | | | | | | | | |--- weights: [6.00, 0.00] class: 0 | | | | |--- avg_price_per_room > 119.42 | | | | | |--- lead_time <= 3.50 | | | | | | |--- avg_price_per_room <= 178.78 | | | | | | | |--- no_of_week_nights <= 4.50 | | | | | | | | |--- weights: [170.00, 23.00] class: 0 | | | | | | | |--- no_of_week_nights > 4.50 | | | | | | | | |--- weights: [0.00, 3.00] class: 1 | | | | | | |--- avg_price_per_room > 178.78 | | | | | | | |--- avg_price_per_room <= 182.00 | | | | | | | | |--- weights: [9.00, 17.00] class: 1 | | | | | | | |--- avg_price_per_room > 182.00 | | | | | | | | |--- weights: [13.00, 0.00] class: 0 | | | | | |--- lead_time > 3.50 | | | | | | |--- arrival_month <= 8.50 | | | | | | | |--- weights: [46.00, 113.00] class: 1 | | | | | | |--- arrival_month > 8.50 | | | | | | | |--- arrival_date <= 14.00 | | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | | |--- weights: [14.00, 22.00] class: 1 | | | | | | | | |--- arrival_month > 11.50 | | | | | | | | | |--- weights: [7.00, 0.00] class: 0 | | | | | | | |--- arrival_date > 14.00 | | | | | | | | |--- weights: [38.00, 6.00] class: 0 | | | |--- lead_time > 13.50 | | | | |--- avg_price_per_room <= 105.27 | | | | | |--- avg_price_per_room <= 60.07 | | | | | | |--- lead_time <= 84.50 | | | | | | | |--- weights: [70.00, 5.00] class: 0 | | | | | | |--- lead_time > 84.50 | | | | | | | |--- arrival_year <= 2017.50 | | | | | | | | |--- arrival_date <= 19.00 | | | | | | | | | |--- weights: [1.00, 8.00] class: 1 | | | | | | | | |--- arrival_date > 19.00 | | | | | | | | | |--- weights: [8.00, 2.00] class: 0 | | | | | | | |--- arrival_year > 2017.50 | | | | | | | | |--- weights: [14.00, 1.00] class: 0 | | | | | |--- avg_price_per_room > 60.07 | | | | | | |--- lead_time <= 25.50 | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | |--- arrival_month <= 1.50 | | | | | | | | | |--- weights: [29.00, 0.00] class: 0 | | | | | | | | |--- arrival_month > 1.50 | | | | | | | | | |--- arrival_year <= 2017.50 | | | | | | | | | | |--- weights: [30.00, 2.00] class: 0 | | | | | | | | | |--- arrival_year > 2017.50 | | | | | | | | | | |--- weights: [59.00, 83.00] class: 1 | | | | | | | |--- arrival_month > 11.50 | | | | | | | | |--- weights: [54.00, 0.00] class: 0 | | | | | | |--- lead_time > 25.50 | | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | | |--- type_of_meal_plan_Meal Plan 2 <= 0.50 | | | | | | | | | |--- arrival_year <= 2017.50 | | | | | | | | | | |--- lead_time <= 60.50 | | | | | | | | | | | |--- weights: [58.00, 3.00] class: 0 | | | | | | | | | | |--- lead_time > 60.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- arrival_year > 2017.50 | | | | | | | | | | |--- required_car_parking_space <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | | | | |--- required_car_parking_space > 0.50 | | | | | | | | | | | |--- weights: [12.00, 0.00] class: 0 | | | | | | | | |--- type_of_meal_plan_Meal Plan 2 > 0.50 | | | | | | | | | |--- arrival_month <= 5.00 | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | | | |--- arrival_month > 5.00 | | | | | | | | | | |--- weights: [1.00, 37.00] class: 1 | | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | | |--- required_car_parking_space <= 0.50 | | | | | | | | | |--- arrival_year <= 2017.50 | | | | | | | | | | |--- weights: [6.00, 1.00] class: 0 | | | | | | | | | |--- arrival_year > 2017.50 | | | | | | | | | | |--- no_of_adults <= 1.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- no_of_adults > 1.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | |--- required_car_parking_space > 0.50 | | | | | | | | | |--- weights: [6.00, 0.00] class: 0 | | | | |--- avg_price_per_room > 105.27 | | | | | |--- required_car_parking_space <= 0.50 | | | | | | |--- arrival_month <= 10.50 | | | | | | | |--- arrival_year <= 2017.50 | | | | | | | | |--- type_of_meal_plan_Meal Plan 2 <= 0.50 | | | | | | | | | |--- weights: [20.00, 5.00] class: 0 | | | | | | | | |--- type_of_meal_plan_Meal Plan 2 > 0.50 | | | | | | | | | |--- weights: [0.00, 13.00] class: 1 | | | | | | | |--- arrival_year > 2017.50 | | | | | | | | |--- room_type_reserved_Room_Type 5 <= 0.50 | | | | | | | | | |--- avg_price_per_room <= 171.22 | | | | | | | | | | |--- arrival_date <= 24.50 | | | | | | | | | | | |--- truncated branch of depth 9 | | | | | | | | | | |--- arrival_date > 24.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | |--- avg_price_per_room > 171.22 | | | | | | | | | | |--- avg_price_per_room <= 181.24 | | | | | | | | | | | |--- weights: [4.00, 102.00] class: 1 | | | | | | | | | | |--- avg_price_per_room > 181.24 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | |--- room_type_reserved_Room_Type 5 > 0.50 | | | | | | | | | |--- arrival_date <= 26.50 | | | | | | | | | | |--- weights: [15.00, 5.00] class: 0 | | | | | | | | | |--- arrival_date > 26.50 | | | | | | | | | | |--- weights: [0.00, 5.00] class: 1 | | | | | | |--- arrival_month > 10.50 | | | | | | | |--- lead_time <= 22.50 | | | | | | | | |--- no_of_adults <= 1.50 | | | | | | | | | |--- weights: [1.00, 4.00] class: 1 | | | | | | | | |--- no_of_adults > 1.50 | | | | | | | | | |--- weights: [22.00, 0.00] class: 0 | | | | | | | |--- lead_time > 22.50 | | | | | | | | |--- avg_price_per_room <= 168.06 | | | | | | | | | |--- avg_price_per_room <= 147.75 | | | | | | | | | | |--- weights: [27.00, 41.00] class: 1 | | | | | | | | | |--- avg_price_per_room > 147.75 | | | | | | | | | | |--- weights: [0.00, 15.00] class: 1 | | | | | | | | |--- avg_price_per_room > 168.06 | | | | | | | | | |--- weights: [16.00, 4.00] class: 0 | | | | | |--- required_car_parking_space > 0.50 | | | | | | |--- weights: [39.00, 1.00] class: 0 | |--- no_of_special_requests > 0.50 | | |--- no_of_special_requests <= 1.50 | | | |--- market_segment_type_Online <= 0.50 | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | |--- weights: [1038.00, 20.00] class: 0 | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | |--- lead_time <= 63.00 | | | | | | |--- weights: [21.00, 1.00] class: 0 | | | | | |--- lead_time > 63.00 | | | | | | |--- weights: [0.00, 6.00] class: 1 | | | |--- market_segment_type_Online > 0.50 | | | | |--- lead_time <= 8.50 | | | | | |--- weights: [1015.00, 71.00] class: 0 | | | | |--- lead_time > 8.50 | | | | | |--- required_car_parking_space <= 0.50 | | | | | | |--- avg_price_per_room <= 127.62 | | | | | | | |--- no_of_weekend_nights <= 2.50 | | | | | | | | |--- lead_time <= 43.50 | | | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | | | |--- arrival_month <= 1.50 | | | | | | | | | | | |--- weights: [87.00, 0.00] class: 0 | | | | | | | | | | |--- arrival_month > 1.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | |--- arrival_month > 11.50 | | | | | | | | | | |--- weights: [128.00, 0.00] class: 0 | | | | | | | | |--- lead_time > 43.50 | | | | | | | | | |--- arrival_year <= 2017.50 | | | | | | | | | | |--- arrival_month <= 7.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- arrival_month > 7.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- arrival_year > 2017.50 | | | | | | | | | | |--- arrival_month <= 8.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | | |--- arrival_month > 8.50 | | | | | | | | | | | |--- truncated branch of depth 11 | | | | | | | |--- no_of_weekend_nights > 2.50 | | | | | | | | |--- arrival_month <= 1.50 | | | | | | | | | |--- weights: [3.00, 0.00] class: 0 | | | | | | | | |--- arrival_month > 1.50 | | | | | | | | | |--- avg_price_per_room <= 119.12 | | | | | | | | | | |--- weights: [6.00, 22.00] class: 1 | | | | | | | | | |--- avg_price_per_room > 119.12 | | | | | | | | | | |--- weights: [3.00, 0.00] class: 0 | | | | | | |--- avg_price_per_room > 127.62 | | | | | | | |--- lead_time <= 142.50 | | | | | | | | |--- arrival_month <= 8.50 | | | | | | | | | |--- arrival_date <= 19.50 | | | | | | | | | | |--- avg_price_per_room <= 177.15 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- avg_price_per_room > 177.15 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- arrival_date > 19.50 | | | | | | | | | | |--- arrival_date <= 27.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- arrival_date > 27.50 | | | | | | | | | | | |--- weights: [61.00, 16.00] class: 0 | | | | | | | | |--- arrival_month > 8.50 | | | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | | | |--- arrival_year <= 2017.50 | | | | | | | | | | | |--- weights: [34.00, 5.00] class: 0 | | | | | | | | | | |--- arrival_year > 2017.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | |--- arrival_month > 11.50 | | | | | | | | | | |--- lead_time <= 100.50 | | | | | | | | | | | |--- weights: [49.00, 0.00] class: 0 | | | | | | | | | | |--- lead_time > 100.50 | | | | | | | | | | | |--- weights: [1.00, 6.00] class: 1 | | | | | | | |--- lead_time > 142.50 | | | | | | | | |--- avg_price_per_room <= 142.65 | | | | | | | | | |--- weights: [6.00, 2.00] class: 0 | | | | | | | | |--- avg_price_per_room > 142.65 | | | | | | | | | |--- weights: [1.00, 12.00] class: 1 | | | | | |--- required_car_parking_space > 0.50 | | | | | | |--- weights: [180.00, 1.00] class: 0 | | |--- no_of_special_requests > 1.50 | | | |--- lead_time <= 90.50 | | | | |--- no_of_week_nights <= 3.50 | | | | | |--- weights: [2126.00, 0.00] class: 0 | | | | |--- no_of_week_nights > 3.50 | | | | | |--- no_of_week_nights <= 9.50 | | | | | | |--- weights: [312.00, 36.00] class: 0 | | | | | |--- no_of_week_nights > 9.50 | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | |--- lead_time > 90.50 | | | | |--- no_of_special_requests <= 2.50 | | | | | |--- arrival_month <= 8.50 | | | | | | |--- lead_time <= 150.50 | | | | | | | |--- arrival_year <= 2017.50 | | | | | | | | |--- arrival_month <= 7.50 | | | | | | | | | |--- weights: [2.00, 6.00] class: 1 | | | | | | | | |--- arrival_month > 7.50 | | | | | | | | | |--- weights: [9.00, 2.00] class: 0 | | | | | | | |--- arrival_year > 2017.50 | | | | | | | | |--- weights: [235.00, 24.00] class: 0 | | | | | | |--- lead_time > 150.50 | | | | | | | |--- weights: [2.00, 5.00] class: 1 | | | | | |--- arrival_month > 8.50 | | | | | | |--- avg_price_per_room <= 90.42 | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | |--- arrival_date <= 21.50 | | | | | | | | | |--- weights: [6.00, 19.00] class: 1 | | | | | | | | |--- arrival_date > 21.50 | | | | | | | | | |--- weights: [9.00, 4.00] class: 0 | | | | | | | |--- arrival_month > 11.50 | | | | | | | | |--- weights: [21.00, 5.00] class: 0 | | | | | | |--- avg_price_per_room > 90.42 | | | | | | | |--- weights: [107.00, 42.00] class: 0 | | | | |--- no_of_special_requests > 2.50 | | | | | |--- weights: [90.00, 0.00] class: 0 |--- lead_time > 151.50 | |--- avg_price_per_room <= 100.04 | | |--- no_of_special_requests <= 0.50 | | | |--- market_segment_type_Online <= 0.50 | | | | |--- no_of_adults <= 1.50 | | | | | |--- lead_time <= 163.50 | | | | | | |--- arrival_date <= 7.00 | | | | | | | |--- weights: [0.00, 15.00] class: 1 | | | | | | |--- arrival_date > 7.00 | | | | | | | |--- weights: [5.00, 1.00] class: 0 | | | | | |--- lead_time > 163.50 | | | | | | |--- lead_time <= 341.00 | | | | | | | |--- lead_time <= 173.00 | | | | | | | | |--- arrival_date <= 3.50 | | | | | | | | | |--- weights: [63.00, 6.00] class: 0 | | | | | | | | |--- arrival_date > 3.50 | | | | | | | | | |--- arrival_month <= 5.00 | | | | | | | | | | |--- weights: [3.00, 0.00] class: 0 | | | | | | | | | |--- arrival_month > 5.00 | | | | | | | | | | |--- weights: [0.00, 9.00] class: 1 | | | | | | | |--- lead_time > 173.00 | | | | | | | | |--- weights: [262.00, 8.00] class: 0 | | | | | | |--- lead_time > 341.00 | | | | | | | |--- no_of_week_nights <= 4.00 | | | | | | | | |--- weights: [17.00, 10.00] class: 0 | | | | | | | |--- no_of_week_nights > 4.00 | | | | | | | | |--- weights: [1.00, 8.00] class: 1 | | | | |--- no_of_adults > 1.50 | | | | | |--- avg_price_per_room <= 84.58 | | | | | | |--- lead_time <= 244.00 | | | | | | | |--- no_of_week_nights <= 1.50 | | | | | | | | |--- no_of_weekend_nights <= 1.50 | | | | | | | | | |--- arrival_date <= 19.00 | | | | | | | | | | |--- lead_time <= 166.50 | | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | | | | |--- lead_time > 166.50 | | | | | | | | | | | |--- weights: [1.00, 38.00] class: 1 | | | | | | | | | |--- arrival_date > 19.00 | | | | | | | | | | |--- weights: [3.00, 0.00] class: 0 | | | | | | | | |--- no_of_weekend_nights > 1.50 | | | | | | | | | |--- weights: [24.00, 0.00] class: 0 | | | | | | | |--- no_of_week_nights > 1.50 | | | | | | | | |--- avg_price_per_room <= 66.50 | | | | | | | | | |--- no_of_weekend_nights <= 0.50 | | | | | | | | | | |--- arrival_date <= 16.00 | | | | | | | | | | | |--- weights: [0.00, 7.00] class: 1 | | | | | | | | | | |--- arrival_date > 16.00 | | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | | | |--- no_of_weekend_nights > 0.50 | | | | | | | | | | |--- weights: [17.00, 2.00] class: 0 | | | | | | | | |--- avg_price_per_room > 66.50 | | | | | | | | | |--- weights: [123.00, 9.00] class: 0 | | | | | | |--- lead_time > 244.00 | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | |--- arrival_year <= 2017.50 | | | | | | | | | |--- weights: [34.00, 0.00] class: 0 | | | | | | | | |--- arrival_year > 2017.50 | | | | | | | | | |--- avg_price_per_room <= 80.38 | | | | | | | | | | |--- no_of_week_nights <= 3.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- no_of_week_nights > 3.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | |--- avg_price_per_room > 80.38 | | | | | | | | | | |--- weights: [11.00, 0.00] class: 0 | | | | | | | |--- arrival_month > 11.50 | | | | | | | | |--- weights: [37.00, 0.00] class: 0 | | | | | |--- avg_price_per_room > 84.58 | | | | | | |--- arrival_month <= 11.50 | | | | | | | |--- no_of_weekend_nights <= 1.50 | | | | | | | | |--- room_type_reserved_Room_Type 4 <= 0.50 | | | | | | | | | |--- weights: [10.00, 313.00] class: 1 | | | | | | | | |--- room_type_reserved_Room_Type 4 > 0.50 | | | | | | | | | |--- weights: [4.00, 0.00] class: 0 | | | | | | | |--- no_of_weekend_nights > 1.50 | | | | | | | | |--- arrival_month <= 6.50 | | | | | | | | | |--- weights: [0.00, 13.00] class: 1 | | | | | | | | |--- arrival_month > 6.50 | | | | | | | | | |--- weights: [14.00, 0.00] class: 0 | | | | | | |--- arrival_month > 11.50 | | | | | | | |--- weights: [9.00, 0.00] class: 0 | | | |--- market_segment_type_Online > 0.50 | | | | |--- avg_price_per_room <= 35.17 | | | | | |--- no_of_weekend_nights <= 0.50 | | | | | | |--- weights: [3.00, 6.00] class: 1 | | | | | |--- no_of_weekend_nights > 0.50 | | | | | | |--- weights: [9.00, 0.00] class: 0 | | | | |--- avg_price_per_room > 35.17 | | | | | |--- weights: [5.00, 599.00] class: 1 | | |--- no_of_special_requests > 0.50 | | | |--- no_of_weekend_nights <= 0.50 | | | | |--- lead_time <= 180.50 | | | | | |--- weights: [60.00, 8.00] class: 0 | | | | |--- lead_time > 180.50 | | | | | |--- market_segment_type_Online <= 0.50 | | | | | | |--- weights: [17.00, 4.00] class: 0 | | | | | |--- market_segment_type_Online > 0.50 | | | | | | |--- no_of_special_requests <= 2.50 | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | |--- avg_price_per_room <= 44.12 | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | | |--- avg_price_per_room > 44.12 | | | | | | | | | |--- weights: [0.00, 125.00] class: 1 | | | | | | | |--- arrival_month > 11.50 | | | | | | | | |--- weights: [8.00, 11.00] class: 1 | | | | | | |--- no_of_special_requests > 2.50 | | | | | | | |--- weights: [12.00, 0.00] class: 0 | | | |--- no_of_weekend_nights > 0.50 | | | | |--- market_segment_type_Online <= 0.50 | | | | | |--- weights: [151.00, 5.00] class: 0 | | | | |--- market_segment_type_Online > 0.50 | | | | | |--- no_of_week_nights <= 9.50 | | | | | | |--- arrival_month <= 11.50 | | | | | | | |--- arrival_date <= 27.50 | | | | | | | | |--- weights: [269.00, 47.00] class: 0 | | | | | | | |--- arrival_date > 27.50 | | | | | | | | |--- no_of_week_nights <= 1.50 | | | | | | | | | |--- lead_time <= 224.50 | | | | | | | | | | |--- weights: [1.00, 10.00] class: 1 | | | | | | | | | |--- lead_time > 224.50 | | | | | | | | | | |--- weights: [4.00, 0.00] class: 0 | | | | | | | | |--- no_of_week_nights > 1.50 | | | | | | | | | |--- lead_time <= 269.00 | | | | | | | | | | |--- weights: [35.00, 8.00] class: 0 | | | | | | | | | |--- lead_time > 269.00 | | | | | | | | | | |--- weights: [0.00, 3.00] class: 1 | | | | | | |--- arrival_month > 11.50 | | | | | | | |--- arrival_date <= 14.50 | | | | | | | | |--- weights: [11.00, 2.00] class: 0 | | | | | | | |--- arrival_date > 14.50 | | | | | | | | |--- weights: [15.00, 19.00] class: 1 | | | | | |--- no_of_week_nights > 9.50 | | | | | | |--- weights: [1.00, 7.00] class: 1 | |--- avg_price_per_room > 100.04 | | |--- arrival_month <= 11.50 | | | |--- no_of_special_requests <= 2.50 | | | | |--- weights: [0.00, 2108.00] class: 1 | | | |--- no_of_special_requests > 2.50 | | | | |--- weights: [31.00, 0.00] class: 0 | | |--- arrival_month > 11.50 | | | |--- no_of_special_requests <= 0.50 | | | | |--- weights: [47.00, 0.00] class: 0 | | | |--- no_of_special_requests > 0.50 | | | | |--- arrival_date <= 24.50 | | | | | |--- weights: [5.00, 0.00] class: 0 | | | | |--- arrival_date > 24.50 | | | | | |--- weights: [5.00, 15.00] class: 1
# importance of features in the tree building ( The importance of a feature is computed as the
# (normalized) total reduction of the 'criterion' brought by that feature. It is also known as the Gini importance )
print(
pd.DataFrame(
best_model.feature_importances_, columns=["Imp"], index=X_train.columns
).sort_values(by="Imp", ascending=False)
)
Imp lead_time 0.40262 avg_price_per_room 0.15089 market_segment_type_Online 0.13977 no_of_special_requests 0.09852 arrival_month 0.05778 arrival_date 0.03758 no_of_weekend_nights 0.03037 no_of_adults 0.02487 no_of_week_nights 0.01988 arrival_year 0.01336 required_car_parking_space 0.00931 type_of_meal_plan_Meal Plan 2 0.00402 type_of_meal_plan_Not Selected 0.00354 room_type_reserved_Room_Type 4 0.00334 market_segment_type_Offline 0.00167 no_of_children 0.00101 room_type_reserved_Room_Type 5 0.00088 room_type_reserved_Room_Type 2 0.00060 no_of_previous_bookings_not_canceled 0.00000 no_of_previous_cancellations 0.00000 repeated_guest 0.00000 room_type_reserved_Room_Type 3 0.00000 room_type_reserved_Room_Type 6 0.00000 room_type_reserved_Room_Type 7 0.00000 market_segment_type_Complementary 0.00000 market_segment_type_Corporate 0.00000 type_of_meal_plan_Meal Plan 3 0.00000
importances = best_model.feature_importances_
indices = np.argsort(importances)
plt.figure(figsize=(12, 12))
plt.title("Feature Importances")
plt.barh(range(len(indices)), importances[indices], color="violet", align="center")
plt.yticks(range(len(indices)), [feature_names[i] for i in indices])
plt.xlabel("Relative Importance")
plt.show()
Lead time, average room per room, market segment type online, no of special requests, arrival date, arrival month , no of wekeend nights, no of week nights are still the important features.
# training performance comparison
models_train_comp_df = pd.concat(
[
decision_tree_perf_train.T,
decision_tree_tune_perf_train.T,
decision_tree_postpruned_perf_train.T,
],
axis=1,
)
models_train_comp_df.columns = [
"Decision Tree sklearn",
"Decision Tree (Pre-Pruning)",
"Decision Tree (Post-Pruning)",
]
print("Training performance comparison:")
models_train_comp_df
Training performance comparison:
| Decision Tree sklearn | Decision Tree (Pre-Pruning) | Decision Tree (Post-Pruning) | |
|---|---|---|---|
| Accuracy | 0.99421 | 0.83085 | 0.90182 |
| Recall | 0.98661 | 0.78608 | 0.83044 |
| Precision | 0.99578 | 0.72401 | 0.86596 |
| F1 | 0.99117 | 0.75377 | 0.84783 |
# test performance comparison
models_train_comp_df = pd.concat(
[
decision_tree_perf_test.T,
decision_tree_tune_perf_test.T,
decision_tree_postpruned_perf_test.T,
],
axis=1,
)
models_train_comp_df.columns = [
"Decision Tree sklearn",
"Decision Tree (Pre-Pruning)",
"Decision Tree (Post-Pruning)",
]
print("Test set performance comparison:")
models_train_comp_df
Test set performance comparison:
| Decision Tree sklearn | Decision Tree (Pre-Pruning) | Decision Tree (Post-Pruning) | |
|---|---|---|---|
| Accuracy | 0.87062 | 0.83497 | 0.88303 |
| Recall | 0.80892 | 0.78336 | 0.79841 |
| Precision | 0.79492 | 0.72758 | 0.83319 |
| F1 | 0.80186 | 0.75444 | 0.81543 |